Woosuk Kwon
|
ced36cd89b
|
[ROCm] Upgrade PyTorch nightly version (#6845)
|
2024-07-26 20:16:13 -07:00 |
|
Sanger Steel
|
969d032265
|
[Bugfix]: Fix Tensorizer test failures (#6835)
|
2024-07-26 20:02:25 -07:00 |
|
Lucas Wilkinson
|
55712941e5
|
[Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852)
|
2024-07-27 02:27:44 +00:00 |
|
Cyrus Leung
|
981b0d5673
|
[Frontend] Factor out code for running uvicorn (#6828)
|
2024-07-27 09:58:25 +08:00 |
|
Woosuk Kwon
|
d09b94ca58
|
[TPU] Support collective communications in XLA devices (#6813)
|
2024-07-27 01:45:57 +00:00 |
|
chenqianfzh
|
bb5494676f
|
enforce eager mode with bnb quantization temporarily (#6846)
|
2024-07-27 01:32:20 +00:00 |
|
Gurpreet Singh Dhami
|
b5f49ee55b
|
Update README.md (#6847)
|
2024-07-27 00:26:45 +00:00 |
|
Zhanghao Wu
|
150a1ffbfd
|
[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283)
|
2024-07-26 14:39:10 -07:00 |
|
Michael Goin
|
281977bd6e
|
[Doc] Add Nemotron to supported model docs (#6843)
|
2024-07-26 17:32:44 -04:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Woosuk Kwon
|
aa4867791e
|
[Misc][TPU] Support TPU in initialize_ray_cluster (#6812)
|
2024-07-26 19:39:49 +00:00 |
|
Woosuk Kwon
|
71734f1bf2
|
[Build/CI][ROCm] Minor simplification to Dockerfile.rocm (#6811)
|
2024-07-26 12:28:32 -07:00 |
|
Tyler Michael Smith
|
50704f52c4
|
[Bugfix][Kernel] Promote another index to int64_t (#6838)
|
2024-07-26 18:41:04 +00:00 |
|
Michael Goin
|
07278c37dd
|
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611)
|
2024-07-26 14:33:42 -04:00 |
|
youkaichao
|
85ad7e2d01
|
[doc][debugging] add known issues for hangs (#6816)
|
2024-07-25 21:48:05 -07:00 |
|
Peng Guanwen
|
89a84b0bb7
|
[Core] Use array to speedup padding (#6779)
|
2024-07-25 21:31:31 -07:00 |
|
Anthony Platanios
|
084a01fd35
|
[Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770)
|
2024-07-25 21:25:35 -07:00 |
|
QQSong
|
062a1d0fab
|
Fix ReplicatedLinear weight loading (#6793)
|
2024-07-25 19:24:58 -07:00 |
|
Kevin H. Luu
|
2eb9f4ff26
|
[ci] Mark tensorizer as soft fail and separate from grouped test (#6810)
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-25 18:08:33 -07:00 |
|
youkaichao
|
443c7cf4cf
|
[ci][distributed] fix flaky tests (#6806)
|
2024-07-25 17:44:09 -07:00 |
|
SangBin Cho
|
1adddb14bf
|
[Core] Fix ray forward_dag error mssg (#6792)
|
2024-07-25 16:53:25 -07:00 |
|
Woosuk Kwon
|
b7215de2c5
|
[Docs] Publish 5th meetup slides (#6799)
|
2024-07-25 16:47:55 -07:00 |
|
youkaichao
|
f3ff63c3f4
|
[doc][distributed] improve multinode serving doc (#6804)
|
2024-07-25 15:38:32 -07:00 |
|
Lucas Wilkinson
|
cd7edc4e87
|
[Bugfix] Fix empty (nullptr) channelwise scales when loading wNa16 using compressed tensors (#6798)
|
2024-07-25 15:05:09 -07:00 |
|
Kuntai Du
|
6a1e25b151
|
[Doc] Add documentations for nightly benchmarks (#6412)
|
2024-07-25 11:57:16 -07:00 |
|
Tyler Michael Smith
|
95db75de64
|
[Bugfix] Add synchronize to prevent possible data race (#6788)
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-07-25 10:40:01 -07:00 |
|
Michael Goin
|
65b1f121c8
|
[Bugfix] Fix kv_cache_dtype=fp8 without scales for FP8 checkpoints (#6761)
|
2024-07-25 09:46:15 -07:00 |
|
Robert Shaw
|
889da130e7
|
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-07-25 09:46:04 -07:00 |
|
Alphi
|
b75e314fff
|
[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-25 09:42:49 -07:00 |
|
Chang Su
|
316a41ac1d
|
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755)
|
2024-07-24 22:48:07 -07:00 |
|
Alexander Matveev
|
0310029a2f
|
[Bugfix] Fix awq_marlin and gptq_marlin flags (#6745)
|
2024-07-24 22:34:11 -07:00 |
|
Cody Yu
|
309aaef825
|
[Bugfix] Fix decode tokens w. CUDA graph (#6757)
|
2024-07-24 22:33:56 -07:00 |
|
Alphi
|
9e169a4c61
|
[Model] Adding support for MiniCPM-V (#4087)
|
2024-07-24 20:59:30 -07:00 |
|
Evan Z. Liu
|
5689e256ba
|
[Frontend] Represent tokens with identifiable strings (#6626)
|
2024-07-25 09:51:00 +08:00 |
|
youkaichao
|
740374d456
|
[core][distributed] fix zmq hang (#6759)
|
2024-07-24 17:37:12 -07:00 |
|
Hongxia Yang
|
d88c458f44
|
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754)
|
2024-07-24 14:32:57 -07:00 |
|
Michael Goin
|
421e218b37
|
[Bugfix] Bump transformers to 4.43.2 (#6752)
|
2024-07-24 13:22:16 -07:00 |
|
Antoni Baum
|
5448f67635
|
[Core] Tweaks to model runner/input builder developer APIs (#6712)
|
2024-07-24 12:17:12 -07:00 |
|
Antoni Baum
|
0e63494cf3
|
Add fp8 support to reshape_and_cache_flash (#6667)
|
2024-07-24 18:36:52 +00:00 |
|
Daniele
|
ee812580f7
|
[Frontend] split run_server into build_server and run_server (#6740)
|
2024-07-24 10:36:04 -07:00 |
|
Allen.Dou
|
40468b13fa
|
[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686)
|
2024-07-24 08:58:42 -07:00 |
|
Nick Hill
|
2cf0df3381
|
[Bugfix] Fix speculative decode seeded test (#6743)
|
2024-07-24 08:58:31 -07:00 |
|
LF Marques
|
545146349c
|
Adding f-string to validation error which is missing (#6748)
|
2024-07-24 08:55:53 -07:00 |
|
liuyhwangyh
|
f4f8a9d892
|
[Bugfix]fix modelscope compatible issue (#6730)
|
2024-07-24 05:04:46 -07:00 |
|
Alexei-V-Ivanov-AMD
|
b570811706
|
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711)
|
2024-07-24 05:01:14 -07:00 |
|
Woosuk Kwon
|
ccc4a73257
|
[Docs][ROCm] Detailed instructions to build from source (#6680)
|
2024-07-24 01:07:23 -07:00 |
|
Roger Wang
|
0a740a11ba
|
[Bugfix] Fix token padding for chameleon (#6724)
|
2024-07-24 01:05:09 -07:00 |
|
Nick Hill
|
c882a7f5b3
|
[SpecDecoding] Update MLPSpeculator CI tests to use smaller model (#6714)
|
2024-07-24 07:34:22 +00:00 |
|
William Lin
|
5e8ca973eb
|
[Bugfix] fix flashinfer cudagraph capture for PP (#6708)
|
2024-07-24 01:49:44 +00:00 |
|
dongmao zhang
|
87525fab92
|
[bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-23 23:45:09 +00:00 |
|