Alexei-V-Ivanov-AMD
26148120b3
[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests ( #4797 )
2024-05-16 20:58:25 -07:00
Simon Mo
f09edd8a25
Add JSON output support for benchmark_latency and benchmark_throughput ( #4848 )
2024-05-16 10:02:56 -07:00
Cody Yu
973617ae02
[Speculative decoding][Re-take] Enable TP>1 speculative decoding ( #4840 )
...
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
2024-05-16 00:53:51 -07:00
Nick Hill
676a99982f
[Core] Add MultiprocessingGPUExecutor ( #4539 )
...
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
2024-05-14 10:38:59 -07:00
Sanger Steel
8bc68e198c
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 ( #4208 )
2024-05-13 14:57:07 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
...
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)
Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
Cody Yu
c833101740
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support ( #4535 )
2024-05-09 18:04:17 -06:00
SangBin Cho
f6a593093a
[CI] Make mistral tests pass ( #4596 )
2024-05-08 08:44:35 -07:00
Alexei-V-Ivanov-AMD
478aed5827
[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. ( #4642 )
2024-05-07 09:23:17 -07:00
Cade Daniel
19cb4716ee
[CI] Add retry for agent lost ( #4633 )
2024-05-06 23:18:57 +00:00
Simon Mo
c7f2cf2b7f
[CI] Reduce wheel size by not shipping debug symbols ( #4602 )
2024-05-04 21:28:58 -07:00
Simon Mo
021b1a2ab7
[CI] check size of the wheels ( #4319 )
2024-05-04 20:44:36 +00:00
Alexei-V-Ivanov-AMD
9b5c9f9484
[CI/Build] AMD CI pipeline with extended set of tests. ( #4267 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-05-02 12:29:07 -07:00
youkaichao
2a85f93007
[Core][Distributed] enable multiple tp group ( #4512 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-05-02 04:28:21 +00:00
SangBin Cho
0d62fe58db
[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption ( #4451 )
2024-05-01 19:24:13 -07:00
Simon Mo
ac5ccf0156
[CI] hotfix: soft fail neuron test ( #4458 )
2024-04-29 19:50:01 +00:00
Simon Mo
03dd7d52bf
[CI] clean docker cache for neuron ( #4441 )
2024-04-28 23:32:07 +00:00
Alexei-V-Ivanov-AMD
7ee82bef1e
[CI/Build] Adding functionality to reset the node's GPUs before processing. ( #4213 )
2024-04-25 09:37:20 -07:00
Robert Shaw
79a268c4ab
[BUG] fixed fp8 conflict with aqlm ( #4307 )
...
Fixes fp8 iterface which broke in AQLM merge.
2024-04-23 18:26:33 -07:00
Hongxia Yang
95e5b087cf
[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring ( #4129 )
2024-04-21 21:57:24 -07:00
youkaichao
8a7a3e4436
[Core] add an option to log every function call to for debugging hang/crash in distributed inference ( #4079 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-18 16:15:12 -07:00
Liangfu Chen
cd2f63fb36
[CI/CD] add neuron docker and ci test scripts ( #3571 )
2024-04-18 15:26:01 -07:00
youkaichao
6dc1fc9cfe
[Core] nccl integrity check and test ( #4155 )
...
[Core] Add integrity check during initialization; add test for it (#4155 )
2024-04-17 22:28:52 -07:00
Cade Daniel
11d652bd4f
[CI] Move CPU/AMD tests to after wait ( #4123 )
2024-04-16 22:53:26 -07:00
Antoni Baum
69e1d2fb69
[Core] Refactor model loading code ( #4097 )
2024-04-16 11:34:39 -07:00
Sanger Steel
711a000255
[Frontend] [Core] feat: Add model loading using tensorizer ( #3476 )
2024-04-13 17:13:01 -07:00
SangBin Cho
36729bac13
[Test] Test multiple attn backend for chunked prefill. ( #4023 )
2024-04-12 09:56:57 -07:00
SangBin Cho
67b4221a61
[Core][5/N] Fully working chunked prefill e2e ( #3884 )
2024-04-10 17:56:48 -07:00
youkaichao
95baec828f
[Core] enable out-of-tree model register ( #3871 )
2024-04-06 17:11:41 -07:00
youkaichao
d03d64fd2e
[CI/Build] refactor dockerfile & fix pip cache
...
[CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859 )
2024-04-04 21:53:16 -07:00
bigPYJ1151
77a6572aa5
[HotFix] [CI/Build] Minor fix for CPU backend CI ( #3787 )
2024-04-01 22:50:53 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend ( #3634 )
...
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
yhu422
d8658c8cc1
Usage Stats Collection ( #2852 )
2024-03-28 22:16:12 -07:00
SangBin Cho
26422e477b
[Test] Make model tests run again and remove --forked from pytest ( #3631 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
Simon Mo
a4075cba4d
[CI] Add test case to run examples scripts ( #3638 )
2024-03-28 14:36:10 -07:00
Simon Mo
96aa014d1e
fix benchmark format reporting in buildkite ( #3693 )
2024-03-28 14:35:16 -07:00
Roger Wang
45b6ef6513
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark ( #3277 )
2024-03-27 13:39:26 -07:00
youkaichao
8f44facddd
[Core] remove cupy dependency ( #3625 )
2024-03-27 00:33:26 -07:00
xwjiang2010
64172a976c
[Feature] Add vision language model support. ( #3042 )
2024-03-25 14:16:30 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner ( #3233 )
2024-03-20 23:25:01 +00:00
SangBin Cho
6e435de766
[1/n][Chunked Prefill] Refactor input query shapes ( #3236 )
2024-03-20 14:46:05 -07:00
Cade Daniel
482b0adf1b
[Testing] Add test_config.py to CI ( #3437 )
2024-03-18 12:48:45 -07:00
Simon Mo
8c654c045f
CI: Add ROCm Docker Build ( #2886 )
2024-03-18 19:33:47 +00:00
Simon Mo
93348d9458
[CI] Shard tests for LoRA and Kernels to speed up ( #3445 )
2024-03-17 14:56:30 -07:00
Antoni Baum
fb96c1e98c
Asynchronous tokenization ( #2879 )
2024-03-15 23:37:01 +00:00
Simon Mo
81653d9688
[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion ( #3383 )
2024-03-13 17:02:21 -07:00
Cade Daniel
8437bae6ef
[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling ( #3103 )
2024-03-08 23:32:46 -08:00
SangBin Cho
24aecf421a
[Tests] Add block manager and scheduler tests ( #3108 )
2024-03-05 18:23:34 -08:00
Woosuk Kwon
929b4f2973
Add LoRA support for Gemma ( #3050 )
2024-02-28 13:03:28 -08:00
Ronen Schaffer
4caf7044e0
Include tokens from prompt phase in counter_generation_tokens ( #2802 )
2024-02-22 14:00:12 -08:00