Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support ( #4687 )
...
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.
I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kevin H. Luu
13db4369d9
[ci] Deprecate original CI template ( #5624 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 14:26:20 +00:00
Roger Wang
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
Chang Su
f0cc0e68e3
[Misc] Remove import from transformers logging ( #5625 )
2024-06-18 12:12:19 +00:00
youkaichao
db5ec52ad7
[bugfix][distributed] improve p2p capability test ( #5612 )
...
[bugfix][distributed] do not error if two processes do not agree on p2p capability (#5612 )
2024-06-18 07:21:05 +00:00
Kuntai Du
114d7270ff
[CI] Avoid naming different metrics with the same name in performance benchmark ( #5615 )
2024-06-17 21:37:18 -07:00
Cyrus Leung
32c86e494a
[Misc] Fix typo ( #5618 )
2024-06-17 20:58:30 -07:00
youkaichao
8eadcf0b90
[misc][typo] fix typo ( #5620 )
2024-06-17 20:54:57 -07:00
Joe Runde
5002175e80
[Kernel] Add punica dimensions for Granite 13b ( #5559 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-06-18 03:54:11 +00:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
sroy745
fa9e385229
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier ( #5131 )
2024-06-17 21:29:09 -05:00
zifeitong
26e1188e51
[Fix] Use utf-8 encoding in entrypoints/openai/run_batch.py ( #5606 )
2024-06-17 23:16:10 +00:00
Bruce Fontaine
a3e8a05d4c
[Bugfix] Fix KV head calculation for MPT models when using GQA ( #5142 )
2024-06-17 15:26:41 -07:00
youkaichao
e441bad674
[Optimization] use a pool to reuse LogicalTokenBlock.token_ids ( #5584 )
2024-06-17 22:08:05 +00:00
youkaichao
1b44aaf4e3
[bugfix][distributed] fix 16 gpus local rank arrangement ( #5604 )
2024-06-17 21:35:04 +00:00
Kuntai Du
9e4e6fe207
[CI] the readability of benchmarking and prepare for dashboard ( #5571 )
...
[CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571 )
2024-06-17 11:41:08 -07:00
Jie Fu (傅杰)
ab66536dbf
[CI/BUILD] Support non-AVX512 vLLM building and testing ( #5574 )
2024-06-17 14:36:10 -04:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
zhyncs
1f12122b17
[Misc] use AutoTokenizer for benchmark serving when vLLM not installed ( #5588 )
2024-06-17 09:40:35 -07:00
Dipika Sikka
890d8d960b
[Kernel] compressed-tensors marlin 24 support ( #5435 )
2024-06-17 12:32:48 -04:00
Charles Riggins
9e74d9d003
Correct alignment in the seq_len diagram. ( #5592 )
...
Co-authored-by: Liqian Chen <liqian.chen@deeplang.ai>
2024-06-17 12:05:33 -04:00
Amit Garg
9333fb8eb9
[Model] Rename Phi3 rope scaling type ( #5595 )
2024-06-17 12:04:14 -04:00
Cody Yu
e2b85cf86a
Fix w8a8 benchmark and add Llama-3-8B ( #5562 )
2024-06-17 06:48:06 +00:00
youkaichao
845a3f26f9
[Doc] add debugging tips for crash and multi-node debugging ( #5581 )
2024-06-17 10:08:01 +08:00
youkaichao
f07d513320
[build][misc] limit numpy version ( #5582 )
2024-06-16 16:07:01 -07:00
Michael Goin
4a6769053a
[CI][BugFix] Flip is_quant_method_supported condition ( #5577 )
2024-06-16 14:07:34 +00:00
Antoni Baum
f31c1f90e3
Add basic correctness 2 GPU tests to 4 GPU pipeline ( #5518 )
2024-06-16 07:48:02 +00:00
zifeitong
3ce2c050dd
[Fix] Correct OpenAI batch response format ( #5554 )
2024-06-15 16:57:54 -07:00
Nick Hill
1c0afa13c5
[BugFix] Don't start a Ray cluster when not using Ray ( #5570 )
2024-06-15 16:30:51 -07:00
Alexander Matveev
d919ecc771
add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 ( #5145 )
2024-06-15 13:38:16 -04:00
SangBin Cho
e691918e3b
[misc] Do not allow to use lora with chunked prefill. ( #5538 )
...
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-06-15 14:59:36 +00:00
Cyrus Leung
81fbb3655f
[CI/Build] Test both text and token IDs in batched OpenAI Completions API ( #5568 )
2024-06-15 07:29:42 -04:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
leiwen83
1b8a0d71cf
[Core][Bugfix]: fix prefix caching for blockv2 ( #5364 )
...
Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-06-14 17:23:56 -07:00
Simon Mo
bd7efe95d0
Add ccache to amd ( #5555 )
2024-06-14 17:18:22 -07:00
youkaichao
f5bb85b435
[Core][Distributed] improve p2p cache generation ( #5528 )
2024-06-14 14:47:45 -07:00
Woosuk Kwon
28c145eb57
[Bugfix] Fix typo in Pallas backend ( #5558 )
2024-06-14 14:40:09 -07:00
Thomas Parnell
e2afb03c92
[Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models ( #5460 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-06-14 20:28:11 +00:00
Sanger Steel
6e2527a7cb
[Doc] Update documentation on Tensorizer ( #5471 )
2024-06-14 11:27:57 -07:00
Simon Mo
cdab68dcdb
[Docs] Add ZhenFund as a Sponsor ( #5548 )
2024-06-14 11:17:21 -07:00
youkaichao
d1c3d7d139
[misc][distributed] fix benign error in is_in_the_same_node ( #5512 )
2024-06-14 10:59:28 -07:00
Cyrus Leung
77490c6f2f
[Core] Remove duplicate processing in async engine ( #5525 )
2024-06-14 10:04:42 -07:00
youkaichao
48f589e18b
[mis] fix flaky test of test_cuda_device_count_stateless ( #5546 )
2024-06-14 10:02:23 -07:00
Tyler Michael Smith
348616ac4b
[Kernel] Suppress mma.sp warning on CUDA 12.5 and later ( #5401 )
2024-06-14 10:02:00 -07:00
Robert Shaw
15985680e2
[ Misc ] Rs/compressed tensors cleanup ( #5432 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
2024-06-14 10:01:46 -07:00
Allen.Dou
d74674bbd9
[Misc] Fix arg names ( #5524 )
2024-06-14 09:47:44 -07:00
Tyler Michael Smith
703475f6c2
[Kernel] Fix CUTLASS 3.x custom broadcast load epilogue ( #5516 )
2024-06-14 09:30:15 -07:00
Cyrus Leung
d47af2bc02
[CI/Build] Disable LLaVA-NeXT CPU test ( #5529 )
2024-06-14 09:27:30 -07:00
Kuntai Du
319ad7f1d3
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label ( #5073 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-06-13 22:36:20 -07:00
Simon Mo
0f0d8bc065
bump version to v0.5.0.post1 ( #5522 )
2024-06-13 19:42:06 -07:00