vllm/vllm/executor
Cody Yu 973617ae02
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840)
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
2024-05-16 00:53:51 -07:00
..
__init__.py Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00
cpu_executor.py [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
distributed_gpu_executor.py [Core] Implement sharded state loader (#4690) 2024-05-15 22:11:54 -07:00
executor_base.py [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
gpu_executor.py [Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840) 2024-05-16 00:53:51 -07:00
multiproc_gpu_executor.py [Core] Add MultiprocessingGPUExecutor (#4539) 2024-05-14 10:38:59 -07:00
multiproc_worker_utils.py [Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
neuron_executor.py [Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
ray_gpu_executor.py [Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840) 2024-05-16 00:53:51 -07:00
ray_utils.py [Core] Add MultiprocessingGPUExecutor (#4539) 2024-05-14 10:38:59 -07:00