Kuntai Du
|
6a1e25b151
|
[Doc] Add documentations for nightly benchmarks (#6412)
|
2024-07-25 11:57:16 -07:00 |
|
Robert Shaw
|
889da130e7
|
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-07-25 09:46:04 -07:00 |
|
Alexei-V-Ivanov-AMD
|
b570811706
|
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711)
|
2024-07-24 05:01:14 -07:00 |
|
youkaichao
|
72fc704803
|
[build] relax wheel size limit (#6704)
|
2024-07-23 14:03:49 -07:00 |
|
Kevin H. Luu
|
69d5ae38dc
|
[ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-22 14:20:41 -07:00 |
|
Robert Shaw
|
9364f74eee
|
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606)
|
2024-07-20 18:50:10 +00:00 |
|
Matt Wong
|
06d6c5fe9f
|
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543)
|
2024-07-20 09:39:07 -07:00 |
|
Robert Shaw
|
683e3cb9c4
|
[ Misc ] fbgemm checkpoints (#6559)
|
2024-07-20 09:36:57 -07:00 |
|
Robert Shaw
|
4cc24f01b1
|
[ Kernel ] Enable Dynamic Per Token fp8 (#6547)
|
2024-07-19 23:08:15 +00:00 |
|
Robert Shaw
|
dbe5588554
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
Nick Hill
|
b5672a112c
|
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-18 19:15:52 -07:00 |
|
Tyler Michael Smith
|
1689219ebf
|
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517)
|
2024-07-18 17:29:25 -07:00 |
|
youkaichao
|
f53b8f0d05
|
[ci][test] add correctness test for cpu offloading (#6549)
|
2024-07-18 23:41:06 +00:00 |
|
Rui Qiao
|
61e592747c
|
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
|
2024-07-17 22:27:09 -07:00 |
|
youkaichao
|
1c27d25fb5
|
[core][model] yet another cpu offload implementation (#6496)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-17 20:54:35 -07:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
Woosuk Kwon
|
4552e37b55
|
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
|
2024-07-15 14:31:16 -07:00 |
|
youkaichao
|
69672f116c
|
[core][distributed] simplify code to support pipeline parallel (#6406)
|
2024-07-14 21:20:51 -07:00 |
|
Robert Shaw
|
a754dc2cb9
|
[CI/Build] Cross python wheel (#6394)
|
2024-07-14 18:54:46 -07:00 |
|
Robert Shaw
|
73030b7dae
|
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
|
2024-07-14 21:38:42 +00:00 |
|
youkaichao
|
ccd3c04571
|
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-14 22:16:21 +08:00 |
|
Tyler Michael Smith
|
9dad5cc859
|
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
|
2024-07-14 13:37:19 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Simon Mo
|
6bc9710f6e
|
Fix release pipeline's dir permission (#6391)
|
2024-07-12 15:52:43 -07:00 |
|
Simon Mo
|
21b2dcedab
|
Fix release pipeline's -e flag (#6390)
|
2024-07-12 14:08:04 -07:00 |
|
Simon Mo
|
07b35af86d
|
Fix interpolation in release pipeline (#6389)
|
2024-07-12 14:03:39 -07:00 |
|
Simon Mo
|
bb1a784b05
|
Fix release-pipeline.yaml (#6388)
|
2024-07-12 14:00:57 -07:00 |
|
Simon Mo
|
d719ba24c5
|
Build some nightly wheels by default (#6380)
|
2024-07-12 13:56:59 -07:00 |
|
Kevin H. Luu
|
b75bce1008
|
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-12 09:58:38 -07:00 |
|
Alexei-V-Ivanov-AMD
|
f9d25c2519
|
[Build/CI] Checking/Waiting for the GPU's clean state (#6379)
|
2024-07-12 09:42:24 -07:00 |
|
Robert Shaw
|
aea19f0989
|
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356)
|
2024-07-12 11:11:29 -04:00 |
|
adityagoel14
|
d26a8b3f1f
|
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350)
|
2024-07-11 21:26:26 -07:00 |
|
Lily Liu
|
d6ab528997
|
[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351)
|
2024-07-12 01:32:06 +00:00 |
|
Robert Shaw
|
7ed6a4f0e1
|
[ BugFix ] Prompt Logprobs Detokenization (#6223)
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
|
2024-07-11 22:02:29 +00:00 |
|
Kuntai Du
|
a4feba929b
|
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362)
|
2024-07-11 13:28:38 -07:00 |
|
Simon Mo
|
52b7fcb35a
|
Benchmark: add H100 suite (#6047)
|
2024-07-11 09:17:07 -07:00 |
|
Kevin H. Luu
|
a0550cbc80
|
Add support for multi-node on CI (#5955)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-09 12:56:56 -07:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|
Simon Mo
|
bc96d5c330
|
Move release wheel env var to Dockerfile instead (#6163)
|
2024-07-05 17:19:53 -07:00 |
|
Simon Mo
|
f0250620dd
|
Fix release wheel build env var (#6162)
|
2024-07-05 16:24:31 -07:00 |
|
Simon Mo
|
2de490d60f
|
Update wheel builds to strip debug (#6161)
|
2024-07-05 14:51:25 -07:00 |
|
Lily Liu
|
69ec3ca14c
|
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-07-04 16:35:51 -07:00 |
|
Yuan
|
81d7a50f24
|
[Hardware][Intel CPU] Adding intel openmp tunings in Docker file (#6008)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-07-04 15:22:12 -07:00 |
|
youkaichao
|
3de6e6a30e
|
[core][distributed] support n layers % pp size != 0 (#6115)
|
2024-07-03 16:40:31 -07:00 |
|
Mor Zusman
|
9d6a8daa87
|
[Model] Jamba support (#4115)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 23:11:29 +00:00 |
|
Robert Shaw
|
7c008c51a9
|
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-02 21:54:35 +00:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|