Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq ( #6883 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
youkaichao
708989341e
[misc] add a flag to enable compile ( #7092 )
2024-08-02 16:18:45 -07:00
Rui Qiao
05308891e2
[Core] Pipeline parallel with Ray ADAG ( #6837 )
...
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
Jee Jee Li
7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton ( #5036 )
2024-07-31 17:12:24 -07:00
Cody Yu
bd70013407
[MISC] Introduce pipeline parallelism partition strategies ( #6920 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-07-31 12:02:17 -07:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
Rui Qiao
61e592747c
[Core] Introduce SPMD worker execution using Ray accelerated DAG ( #6032 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
2024-07-17 22:27:09 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images ( #6444 )
2024-07-15 23:12:25 -07:00
DefTruth
44874a0bf9
[Doc] add env docs for flashinfer backend ( #6437 )
2024-07-14 21:16:51 -07:00
Woosuk Kwon
eeceadaecc
[Misc] Add deprecation warning for beam search ( #6402 )
2024-07-13 11:52:22 -07:00
Avshalom Manevich
12a59959ed
[Bugfix] adding chunking mechanism to fused_moe to handle large inputs ( #6029 )
2024-07-01 21:08:29 +00:00
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
youkaichao
d9a252bc8e
[Core][Distributed] add shm broadcast ( #5399 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-06-21 05:12:35 +00:00
youkaichao
6c5b7af152
[distributed][misc] use fork by default for mp ( #5669 )
2024-06-20 17:06:34 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration ( #5292 )
2024-06-12 11:53:03 -07:00
Woosuk Kwon
8bab4959be
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable ( #5389 )
2024-06-11 00:37:56 -07:00
Roger Wang
7a9cb294ae
[Frontend] Add OpenAI Vision API Support ( #5237 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-06-07 11:23:32 -07:00
youkaichao
388596c914
[Misc][Utils] allow get_open_port to be called for multiple times ( #5333 )
2024-06-06 22:15:11 -07:00
youkaichao
325c119961
[Misc] add logging level env var ( #5045 )
2024-05-24 23:49:49 -07:00
Wenwei Zhang
546a97ef69
[Misc]: allow user to specify port in distributed setting ( #4914 )
2024-05-20 17:45:06 +00:00
Sanger Steel
8bc68e198c
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 ( #4208 )
2024-05-13 14:57:07 -07:00
youkaichao
344bf7cd2d
[Misc] add installation time env vars ( #4574 )
2024-05-03 15:55:56 -07:00
youkaichao
2d7bce9cd5
[Doc] add env vars to the doc ( #4572 )
2024-05-03 05:13:49 +00:00
youkaichao
5b8a7c1cb0
[Misc] centralize all usage of environment variables ( #4548 )
2024-05-02 11:13:25 -07:00