squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kevin H. Luu	949e49a685	[ci] Limit num gpus if specified for A100 (#5694 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 16:30:03 -07:00
youkaichao	d571ca0108	[ci][distributed] add tests for custom allreduce (#5689 )	2024-06-19 20:16:04 +00:00
Kevin H. Luu	3ee5c4bca5	[ci] Add A100 queue into AWS CI template (#5648 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-19 08:42:13 -06:00
Kevin H. Luu	19091efc44	[ci] Setup Release pipeline and build release wheels with cache (#5610 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 11:00:36 -07:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
Kevin H. Luu	13db4369d9	[ci] Deprecate original CI template (#5624 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-18 14:26:20 +00:00
Roger Wang	4ad7b53e59	[CI/Build][Misc] Update Pytest Marker for VLMs (#5623 )	2024-06-18 13:10:04 +00:00
Kuntai Du	114d7270ff	[CI] Avoid naming different metrics with the same name in performance benchmark (#5615 )	2024-06-17 21:37:18 -07:00
Cyrus Leung	32c86e494a	[Misc] Fix typo (#5618 )	2024-06-17 20:58:30 -07:00
Kuntai Du	9e4e6fe207	[CI] the readability of benchmarking and prepare for dashboard (#5571 ) [CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571)	2024-06-17 11:41:08 -07:00
Jie Fu (傅杰)	ab66536dbf	[CI/BUILD] Support non-AVX512 vLLM building and testing (#5574 )	2024-06-17 14:36:10 -04:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Antoni Baum	f31c1f90e3	Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518 )	2024-06-16 07:48:02 +00:00
Simon Mo	bd7efe95d0	Add ccache to amd (#5555 )	2024-06-14 17:18:22 -07:00
Cyrus Leung	d47af2bc02	[CI/Build] Disable LLaVA-NeXT CPU test (#5529 )	2024-06-14 09:27:30 -07:00
Kuntai Du	319ad7f1d3	[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-13 22:36:20 -07:00
Antoni Baum	50eed24d25	Add `cuda_device_count_stateless` (#5473 )	2024-06-13 16:06:49 -07:00
Kevin H. Luu	916d219d62	[ci] Use sccache to build images (#5419 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-12 17:58:12 -07:00
Kevin H. Luu	8b82a89997	[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests (#5464 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-12 14:00:18 -07:00
youkaichao	c4bd03c7c5	[Core][Distributed] add same-node detection (#5369 )	2024-06-11 10:53:59 -07:00
Kevin H. Luu	76477a93b7	[ci] Fix Buildkite agent path (#5392 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 18:58:07 -07:00
Kevin H. Luu	c5602f0baa	[ci] Mount buildkite agent on Docker container to upload benchmark results (#5330 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:22:34 -07:00
Kevin H. Luu	f7f9c5f97b	[ci] Use small_cpu_queue for doc build (#5331 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-10 09:21:11 -07:00
Antoni Baum	ccdc490dda	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00
Cyrus Leung	89c920785f	[CI/Build] Update vision tests (#5307 )	2024-06-06 05:17:18 -05:00
Simon Mo	3a6ae1d33c	[CI] Disable flash_attn backend for spec decode (#5286 )	2024-06-05 15:49:27 -07:00
Tyler Michael Smith	02cc3b51a7	[misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263 )	2024-06-05 10:17:51 -07:00
Simon Mo	d5b1eb081e	[CI] Add nightly benchmarks (#5260 )	2024-06-05 09:42:08 -07:00
Simon Mo	9ca62d8668	[CI] mark AMD test as softfail to prevent blockage (#5256 )	2024-06-04 11:34:53 -07:00
Li, Jiang	45c35f0d58	[CI/Build] Reducing CPU CI execution time (#5241 )	2024-06-04 10:26:40 -07:00
Cyrus Leung	ec784b2526	[CI/Build] Add inputs tests (#5215 )	2024-06-03 21:01:46 -07:00
Kevin H. Luu	4f0d17c05c	New CI template on AWS stack (#5110 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-03 16:16:43 -07:00
Yuan	cafb8e06c5	[CI/BUILD] enable intel queue for longer CPU tests (#4113 )	2024-06-03 10:39:50 -07:00
youkaichao	f758505c73	[CI/Build] increase wheel size limit to 200 MB (#5130 )	2024-05-30 06:29:48 -07:00
omkar kakarparthi	e07aff9e52	[CI/Build] Docker cleanup functionality for amd servers (#5112 ) Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: omkarkakarparthi <okakarpa>	2024-05-30 03:27:39 +00:00
youkaichao	5bd3c65072	[Core][Optimization] remove vllm-nccl (#5091 )	2024-05-29 05:13:52 +00:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Letian Li	2ba80bed27	[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is not defined (#5009 )	2024-05-23 09:08:58 -07:00
Alexei-V-Ivanov-AMD	943e72ca56	[Build/CI] Enabling AMD Entrypoints Test (#4834 ) Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>	2024-05-20 11:29:28 -07:00
SangBin Cho	2e9a2227ec	[Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files	2024-05-18 16:05:23 +09:00
Alexei-V-Ivanov-AMD	26148120b3	[Build/CI] Extending the set of AMD tests with Regression, Basic Correctness, Distributed, Engine, Llava Tests (#4797 )	2024-05-16 20:58:25 -07:00
Simon Mo	f09edd8a25	Add JSON output support for benchmark_latency and benchmark_throughput (#4848 )	2024-05-16 10:02:56 -07:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
Nick Hill	676a99982f	[Core] Add MultiprocessingGPUExecutor (#4539 ) Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>	2024-05-14 10:38:59 -07:00
Sanger Steel	8bc68e198c	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00
Cyrus Leung	350f9e107f	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 ) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.	2024-05-13 23:50:09 +09:00
Cody Yu	c833101740	[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535 )	2024-05-09 18:04:17 -06:00
SangBin Cho	f6a593093a	[CI] Make mistral tests pass (#4596 )	2024-05-08 08:44:35 -07:00
Alexei-V-Ivanov-AMD	478aed5827	[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (#4642 )	2024-05-07 09:23:17 -07:00
Cade Daniel	19cb4716ee	[CI] Add retry for agent lost (#4633 )	2024-05-06 23:18:57 +00:00

1 2 3

102 Commits