Alexei-V-Ivanov-AMD
b570811706
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. ( #6711 )
2024-07-24 05:01:14 -07:00
youkaichao
72fc704803
[build] relax wheel size limit ( #6704 )
2024-07-23 14:03:49 -07:00
Kevin H. Luu
69d5ae38dc
[ci] Use different sccache bucket for CUDA 11.8 wheel build ( #6656 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-22 14:20:41 -07:00
Robert Shaw
9364f74eee
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models ( #6606 )
2024-07-20 18:50:10 +00:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Robert Shaw
683e3cb9c4
[ Misc ] fbgemm checkpoints ( #6559 )
2024-07-20 09:36:57 -07:00
Robert Shaw
4cc24f01b1
[ Kernel ] Enable Dynamic Per Token fp8 ( #6547 )
2024-07-19 23:08:15 +00:00
Robert Shaw
dbe5588554
[ Misc ] non-uniform quantization via compressed-tensors for Llama ( #6515 )
2024-07-18 22:39:18 -04:00
Nick Hill
b5672a112c
[Core] Multiprocessing Pipeline Parallel support ( #6130 )
...
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-18 19:15:52 -07:00
Tyler Michael Smith
1689219ebf
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 ( #6517 )
2024-07-18 17:29:25 -07:00
youkaichao
f53b8f0d05
[ci][test] add correctness test for cpu offloading ( #6549 )
2024-07-18 23:41:06 +00:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG ( #6032 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
youkaichao
1c27d25fb5
[core][model] yet another cpu offload implementation ( #6496 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-17 20:54:35 -07:00
youkaichao
09c2eb85dd
[ci][distributed] add pipeline parallel correctness test ( #6410 )
2024-07-16 15:44:22 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images ( #6444 )
2024-07-15 23:12:25 -07:00
Woosuk Kwon
4552e37b55
[CI/Build][TPU] Add TPU CI test ( #6277 )
...
Co-authored-by: kevin <kevin@anyscale.com>
2024-07-15 14:31:16 -07:00
youkaichao
69672f116c
[core][distributed] simplify code to support pipeline parallel ( #6406 )
2024-07-14 21:20:51 -07:00
Robert Shaw
a754dc2cb9
[CI/Build] Cross python wheel ( #6394 )
2024-07-14 18:54:46 -07:00
Robert Shaw
73030b7dae
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 ( #6423 )
2024-07-14 21:38:42 +00:00
youkaichao
ccd3c04571
[ci][build] fix commit id ( #6420 )
...
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-14 22:16:21 +08:00
Tyler Michael Smith
9dad5cc859
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace ( #6384 )
2024-07-14 13:37:19 +00:00
Robert Shaw
fb6af8bc08
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 ( #6417 )
2024-07-13 20:03:58 -07:00
Robert Shaw
babf52dade
[ Misc ] More Cleanup of Marlin ( #6359 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-13 10:21:37 +00:00
youkaichao
41708e5034
[ci] try to add multi-node tests ( #6280 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-12 21:51:48 -07:00
Simon Mo
6bc9710f6e
Fix release pipeline's dir permission ( #6391 )
2024-07-12 15:52:43 -07:00
Simon Mo
21b2dcedab
Fix release pipeline's -e flag ( #6390 )
2024-07-12 14:08:04 -07:00
Simon Mo
07b35af86d
Fix interpolation in release pipeline ( #6389 )
2024-07-12 14:03:39 -07:00
Simon Mo
bb1a784b05
Fix release-pipeline.yaml ( #6388 )
2024-07-12 14:00:57 -07:00
Simon Mo
d719ba24c5
Build some nightly wheels by default ( #6380 )
2024-07-12 13:56:59 -07:00
Kevin H. Luu
b75bce1008
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline ( #6365 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-12 09:58:38 -07:00
Alexei-V-Ivanov-AMD
f9d25c2519
[Build/CI] Checking/Waiting for the GPU's clean state ( #6379 )
2024-07-12 09:42:24 -07:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration ( #6356 )
2024-07-12 11:11:29 -04:00
adityagoel14
d26a8b3f1f
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub ( #6350 )
2024-07-11 21:26:26 -07:00
Lily Liu
d6ab528997
[Misc] Remove flashinfer warning, add flashinfer tests to CI ( #6351 )
2024-07-12 01:32:06 +00:00
Robert Shaw
7ed6a4f0e1
[ BugFix ] Prompt Logprobs Detokenization ( #6223 )
...
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
2024-07-11 22:02:29 +00:00
Kuntai Du
a4feba929b
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy ( #5362 )
2024-07-11 13:28:38 -07:00
Simon Mo
52b7fcb35a
Benchmark: add H100 suite ( #6047 )
2024-07-11 09:17:07 -07:00
Kevin H. Luu
a0550cbc80
Add support for multi-node on CI ( #5955 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-09 12:56:56 -07:00
Robert Shaw
abfe705a02
[ Misc ] Support Fp8 via llm-compressor ( #6110 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Simon Mo
bc96d5c330
Move release wheel env var to Dockerfile instead ( #6163 )
2024-07-05 17:19:53 -07:00
Simon Mo
f0250620dd
Fix release wheel build env var ( #6162 )
2024-07-05 16:24:31 -07:00
Simon Mo
2de490d60f
Update wheel builds to strip debug ( #6161 )
2024-07-05 14:51:25 -07:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer ( #6051 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
Yuan
81d7a50f24
[Hardware][Intel CPU] Adding intel openmp tunings in Docker file ( #6008 )
...
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2024-07-04 15:22:12 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 ( #6115 )
2024-07-03 16:40:31 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support ( #4115 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
Robert Shaw
7c008c51a9
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral ( #5970 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-02 21:54:35 +00:00
Murali Andoorveedu
c5832d2ae9
[Core] Pipeline Parallel Support ( #4412 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Robert Shaw
d76084c12f
[ CI ] Re-enable Large Model LM Eval ( #6031 )
2024-07-01 12:40:45 -04:00