vllm/vllm/worker
Antoni Baum 22de45235c
Push logprob generation to LLMEngine (#3065)
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-03-04 19:54:06 +00:00
..
spec_decode Push logprob generation to LLMEngine (#3065) 2024-03-04 19:54:06 +00:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
cache_engine.py [Neuron] Support inference with transformers-neuronx (#2569) 2024-02-28 09:34:34 -08:00
model_runner.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
neuron_worker.py [Neuron] Support inference with transformers-neuronx (#2569) 2024-02-28 09:34:34 -08:00
worker.py [Minor] Small fix to make distributed init logic in worker looks cleaner (#2905) 2024-02-18 14:39:00 -08:00