vllm/vllm/worker
2024-10-23 08:28:21 +00:00
..
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
cache_engine.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
cpu_enc_dec_model_runner.py [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) 2024-10-07 06:50:35 +00:00
cpu_model_runner.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
cpu_worker.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
embedding_model_runner.py [Model] PP support for embedding models and update docs (#9090) 2024-10-06 16:35:27 +08:00
enc_dec_model_runner.py [Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
model_runner_base.py [MISC] Skip dumping inputs when unpicklable (#8744) 2024-09-24 06:10:03 +00:00
model_runner.py [Bugfix][Misc]: fix graph capture for decoder (#9549) 2024-10-21 17:33:30 +00:00
multi_step_model_runner.py [Doc] Consistent naming of attention backends (#9498) 2024-10-21 22:29:57 +08:00
multi_step_tpu_worker.py [TPU] Implement multi-step scheduling (#8489) 2024-09-14 16:58:31 -07:00
multi_step_worker.py [Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378) 2024-09-27 13:32:07 -07:00
neuron_model_runner.py [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131) 2024-10-08 14:12:56 +00:00
neuron_worker.py [Bugfix] neuron: enable tensor parallelism (#7562) 2024-08-26 15:13:13 -07:00
openvino_model_runner.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
openvino_worker.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
tpu_model_runner.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
tpu_worker.py [torch.compile] use empty tensor instead of None for profiling (#8875) 2024-09-27 08:11:32 -07:00
utils.py [Doc] Compatibility matrix for mutual exclusive features (#8512) 2024-10-11 11:18:50 -07:00
worker_base.py [Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
worker.py 🐛 fix torch memory profiling (#9516) 2024-10-18 21:25:19 -04:00
xpu_model_runner.py [Kernel] Support sliding window in flash attention backend (#9403) 2024-10-20 10:57:52 -07:00
xpu_worker.py [Hardware][XPU] using current_platform.is_xpu (#9605) 2024-10-23 08:28:21 +00:00