vllm/vllm
2024-03-08 17:16:14 -08:00
..
core Fix auto prefix bug (#3239) 2024-03-07 16:37:28 -08:00
engine [Fix] Avoid pickling entire LLMEngine for Ray workers (#3207) 2024-03-06 00:17:20 +00:00
entrypoints Connect engine healthcheck to openai server (#3260) 2024-03-07 16:38:12 -08:00
lora [Neuron] Support inference with transformers-neuronx (#2569) 2024-02-28 09:34:34 -08:00
model_executor [FIX] Fix prefix test error on main (#3286) 2024-03-08 17:16:14 -08:00
transformers_utils Support starcoder2 architecture (#3089) 2024-02-29 00:51:48 -08:00
worker Fix auto prefix bug (#3239) 2024-03-07 16:37:28 -08:00
__init__.py [FIX] Make flash_attn optional (#3269) 2024-03-08 10:52:20 -08:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py Push logprob generation to LLMEngine (#3065) 2024-03-04 19:54:06 +00:00
logger.py Make vLLM logging formatting optional (#2877) 2024-02-20 14:38:55 -08:00
outputs.py Store eos_token_id in Sequence for easy access (#3166) 2024-03-05 15:35:43 -08:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [Fix] Don't deep-copy LogitsProcessors when copying SamplingParams (#3099) 2024-02-29 19:20:42 +00:00
sequence.py Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) (#3263) 2024-03-07 23:03:22 +00:00
test_utils.py Use CuPy for CUDA graphs (#2811) 2024-02-13 11:32:06 -08:00
utils.py Measure model memory usage (#3120) 2024-03-07 11:42:42 -08:00