..
__init__.py
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
cache_engine.py
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
2024-06-17 11:01:25 -07:00
cpu_model_runner.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
cpu_worker.py
[Core] Refactor Worker and ModelRunner to consolidate control plane communication ( #5408 )
2024-06-25 20:30:03 -07:00
embedding_model_runner.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
model_runner_base.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
model_runner.py
[VLM] Remove image_input_type from VLM config ( #5852 )
2024-07-02 07:57:09 +00:00
neuron_model_runner.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
neuron_worker.py
[Core] Refactor Worker and ModelRunner to consolidate control plane communication ( #5408 )
2024-06-25 20:30:03 -07:00
openvino_model_runner.py
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
openvino_worker.py
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
tpu_model_runner.py
[Bugfix][TPU] Fix pad slot id ( #5977 )
2024-06-28 18:55:17 -07:00
tpu_worker.py
[Bugfix][TPU] Fix TPU sampler output ( #5978 )
2024-06-28 18:14:16 -07:00
worker_base.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
worker.py
[misc][cuda] use nvml to avoid accidentally cuda initialization ( #6007 )
2024-06-30 20:07:34 -07:00
xpu_model_runner.py
[Spec Decode] Introduce DraftModelRunner ( #5799 )
2024-06-28 09:17:51 -07:00
xpu_worker.py
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
2024-06-17 11:01:25 -07:00