Commit Graph

2108 Commits

Author SHA1 Message Date
Varun Sundar Rabindranath
766435e660
[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-29 09:42:35 -06:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Elsa Granger
3eeb148f46
[Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871) 2024-07-28 11:13:49 -04:00
Michael Goin
b1366a9534
Add Nemotron to PP_SUPPORTED_MODELS (#6863) 2024-07-27 15:05:17 -07:00
Alexander Matveev
75acdaa4b6
[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795) 2024-07-27 17:52:33 -04:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
Chenggang Wu
f954d0715c
[Docs] Add RunLLM chat widget (#6857) 2024-07-27 09:24:46 -07:00
Cyrus Leung
1ad86acf17
[Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Wang Ran (汪然)
a57d75821c
[bugfix] make args.stream work (#6831) 2024-07-27 09:07:02 +00:00
Roger Wang
925de97e05
[Bugfix] Fix VLM example typo (#6859) 2024-07-27 14:24:08 +08:00
Roger Wang
aa46953a20
[Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-26 22:44:13 -07:00
Travis Johnson
593e79e733
[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802)
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-07-26 22:15:20 -07:00
Harry Mellor
c53041ae3b
[Doc] Add missing mock import to docs conf.py (#6834) 2024-07-27 04:47:33 +00:00
Woosuk Kwon
52f07e3dec
[Hardware][TPU] Implement tensor parallelism with Ray (#5871) 2024-07-26 20:54:27 -07:00
Joe
14dbd5a767
[Model] H2O Danube3-4b (#6451) 2024-07-26 20:47:50 -07:00
tomeras91
ed94e4f427
[Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba (#6784) 2024-07-26 20:45:31 -07:00
omrishiv
3c3012398e
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon
ced36cd89b
[ROCm] Upgrade PyTorch nightly version (#6845) 2024-07-26 20:16:13 -07:00
Sanger Steel
969d032265
[Bugfix]: Fix Tensorizer test failures (#6835) 2024-07-26 20:02:25 -07:00
Lucas Wilkinson
55712941e5
[Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852) 2024-07-27 02:27:44 +00:00
Cyrus Leung
981b0d5673
[Frontend] Factor out code for running uvicorn (#6828) 2024-07-27 09:58:25 +08:00
Woosuk Kwon
d09b94ca58
[TPU] Support collective communications in XLA devices (#6813) 2024-07-27 01:45:57 +00:00
chenqianfzh
bb5494676f
enforce eager mode with bnb quantization temporarily (#6846) 2024-07-27 01:32:20 +00:00
Gurpreet Singh Dhami
b5f49ee55b
Update README.md (#6847) 2024-07-27 00:26:45 +00:00
Zhanghao Wu
150a1ffbfd
[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) 2024-07-26 14:39:10 -07:00
Michael Goin
281977bd6e
[Doc] Add Nemotron to supported model docs (#6843) 2024-07-26 17:32:44 -04:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
Woosuk Kwon
aa4867791e
[Misc][TPU] Support TPU in initialize_ray_cluster (#6812) 2024-07-26 19:39:49 +00:00
Woosuk Kwon
71734f1bf2
[Build/CI][ROCm] Minor simplification to Dockerfile.rocm (#6811) 2024-07-26 12:28:32 -07:00
Tyler Michael Smith
50704f52c4
[Bugfix][Kernel] Promote another index to int64_t (#6838) 2024-07-26 18:41:04 +00:00
Michael Goin
07278c37dd
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) 2024-07-26 14:33:42 -04:00
youkaichao
85ad7e2d01
[doc][debugging] add known issues for hangs (#6816) 2024-07-25 21:48:05 -07:00
Peng Guanwen
89a84b0bb7
[Core] Use array to speedup padding (#6779) 2024-07-25 21:31:31 -07:00
Anthony Platanios
084a01fd35
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770) 2024-07-25 21:25:35 -07:00
QQSong
062a1d0fab
Fix ReplicatedLinear weight loading (#6793) 2024-07-25 19:24:58 -07:00
Kevin H. Luu
2eb9f4ff26
[ci] Mark tensorizer as soft fail and separate from grouped test (#6810)
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-25 18:08:33 -07:00
youkaichao
443c7cf4cf
[ci][distributed] fix flaky tests (#6806) 2024-07-25 17:44:09 -07:00
SangBin Cho
1adddb14bf
[Core] Fix ray forward_dag error mssg (#6792) 2024-07-25 16:53:25 -07:00
Woosuk Kwon
b7215de2c5
[Docs] Publish 5th meetup slides (#6799) 2024-07-25 16:47:55 -07:00
youkaichao
f3ff63c3f4
[doc][distributed] improve multinode serving doc (#6804) 2024-07-25 15:38:32 -07:00
Lucas Wilkinson
cd7edc4e87
[Bugfix] Fix empty (nullptr) channelwise scales when loading wNa16 using compressed tensors (#6798) 2024-07-25 15:05:09 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
Tyler Michael Smith
95db75de64
[Bugfix] Add synchronize to prevent possible data race (#6788)
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-07-25 10:40:01 -07:00
Michael Goin
65b1f121c8
[Bugfix] Fix kv_cache_dtype=fp8 without scales for FP8 checkpoints (#6761) 2024-07-25 09:46:15 -07:00
Robert Shaw
889da130e7
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-07-25 09:46:04 -07:00
Alphi
b75e314fff
[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-25 09:42:49 -07:00
Chang Su
316a41ac1d
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755) 2024-07-24 22:48:07 -07:00
Alexander Matveev
0310029a2f
[Bugfix] Fix awq_marlin and gptq_marlin flags (#6745) 2024-07-24 22:34:11 -07:00
Cody Yu
309aaef825
[Bugfix] Fix decode tokens w. CUDA graph (#6757) 2024-07-24 22:33:56 -07:00