squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Alexei-V-Ivanov-AMD	26148120b3	[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797 )	2024-05-16 20:58:25 -07:00
Simon Mo	f09edd8a25	Add JSON output support for benchmark_latency and benchmark_throughput (#4848 )	2024-05-16 10:02:56 -07:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
Nick Hill	676a99982f	[Core] Add MultiprocessingGPUExecutor (#4539 ) Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>	2024-05-14 10:38:59 -07:00
Sanger Steel	8bc68e198c	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00
Cyrus Leung	350f9e107f	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 ) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.	2024-05-13 23:50:09 +09:00
Cody Yu	c833101740	[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535 )	2024-05-09 18:04:17 -06:00
SangBin Cho	f6a593093a	[CI] Make mistral tests pass (#4596 )	2024-05-08 08:44:35 -07:00
Alexei-V-Ivanov-AMD	478aed5827	[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (#4642 )	2024-05-07 09:23:17 -07:00
Cade Daniel	19cb4716ee	[CI] Add retry for agent lost (#4633 )	2024-05-06 23:18:57 +00:00
Simon Mo	c7f2cf2b7f	[CI] Reduce wheel size by not shipping debug symbols (#4602 )	2024-05-04 21:28:58 -07:00
Simon Mo	021b1a2ab7	[CI] check size of the wheels (#4319 )	2024-05-04 20:44:36 +00:00
Alexei-V-Ivanov-AMD	9b5c9f9484	[CI/Build] AMD CI pipeline with extended set of tests. (#4267 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-05-02 12:29:07 -07:00
youkaichao	2a85f93007	[Core][Distributed] enable multiple tp group (#4512 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-05-02 04:28:21 +00:00
SangBin Cho	0d62fe58db	[Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451 )	2024-05-01 19:24:13 -07:00
Simon Mo	ac5ccf0156	[CI] hotfix: soft fail neuron test (#4458 )	2024-04-29 19:50:01 +00:00
Simon Mo	03dd7d52bf	[CI] clean docker cache for neuron (#4441 )	2024-04-28 23:32:07 +00:00
Alexei-V-Ivanov-AMD	7ee82bef1e	[CI/Build] Adding functionality to reset the node's GPUs before processing. (#4213 )	2024-04-25 09:37:20 -07:00
Robert Shaw	79a268c4ab	[BUG] fixed fp8 conflict with aqlm (#4307 ) Fixes fp8 iterface which broke in AQLM merge.	2024-04-23 18:26:33 -07:00
Hongxia Yang	95e5b087cf	[AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring (#4129 )	2024-04-21 21:57:24 -07:00
youkaichao	8a7a3e4436	[Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-18 16:15:12 -07:00
Liangfu Chen	cd2f63fb36	[CI/CD] add neuron docker and ci test scripts (#3571 )	2024-04-18 15:26:01 -07:00
youkaichao	6dc1fc9cfe	[Core] nccl integrity check and test (#4155 ) [Core] Add integrity check during initialization; add test for it (#4155)	2024-04-17 22:28:52 -07:00
Cade Daniel	11d652bd4f	[CI] Move CPU/AMD tests to after wait (#4123 )	2024-04-16 22:53:26 -07:00
Antoni Baum	69e1d2fb69	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
SangBin Cho	36729bac13	[Test] Test multiple attn backend for chunked prefill. (#4023 )	2024-04-12 09:56:57 -07:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
youkaichao	d03d64fd2e	[CI/Build] refactor dockerfile & fix pip cache [CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)	2024-04-04 21:53:16 -07:00
bigPYJ1151	77a6572aa5	[HotFix] [CI/Build] Minor fix for CPU backend CI (#3787 )	2024-04-01 22:50:53 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
yhu422	d8658c8cc1	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
SangBin Cho	26422e477b	[Test] Make model tests run again and remove --forked from pytest (#3631 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-28 21:06:40 -07:00
Simon Mo	a4075cba4d	[CI] Add test case to run examples scripts (#3638 )	2024-03-28 14:36:10 -07:00
Simon Mo	96aa014d1e	fix benchmark format reporting in buildkite (#3693 )	2024-03-28 14:35:16 -07:00
Roger Wang	45b6ef6513	feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )	2024-03-27 13:39:26 -07:00
youkaichao	8f44facddd	[Core] remove cupy dependency (#3625 )	2024-03-27 00:33:26 -07:00
xwjiang2010	64172a976c	[Feature] Add vision language model support. (#3042 )	2024-03-25 14:16:30 -07:00
Roy	f1c0fc3919	Migrate `logits` computation and gather to `model_runner` (#3233 )	2024-03-20 23:25:01 +00:00
SangBin Cho	6e435de766	[1/n][Chunked Prefill] Refactor input query shapes (#3236 )	2024-03-20 14:46:05 -07:00
Cade Daniel	482b0adf1b	[Testing] Add test_config.py to CI (#3437 )	2024-03-18 12:48:45 -07:00
Simon Mo	8c654c045f	CI: Add ROCm Docker Build (#2886 )	2024-03-18 19:33:47 +00:00
Simon Mo	93348d9458	[CI] Shard tests for LoRA and Kernels to speed up (#3445 )	2024-03-17 14:56:30 -07:00
Antoni Baum	fb96c1e98c	Asynchronous tokenization (#2879 )	2024-03-15 23:37:01 +00:00
Simon Mo	81653d9688	[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383 )	2024-03-13 17:02:21 -07:00
Cade Daniel	8437bae6ef	[Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103 )	2024-03-08 23:32:46 -08:00
SangBin Cho	24aecf421a	[Tests] Add block manager and scheduler tests (#3108 )	2024-03-05 18:23:34 -08:00
Woosuk Kwon	929b4f2973	Add LoRA support for Gemma (#3050 )	2024-02-28 13:03:28 -08:00
Ronen Schaffer	4caf7044e0	Include tokens from prompt phase in `counter_generation_tokens` (#2802 )	2024-02-22 14:00:12 -08:00

1 2

62 Commits