vllm/vllm
2024-03-16 13:35:27 -07:00
..
core Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220) 2024-03-15 18:25:43 +00:00
engine Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
entrypoints Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) 2024-03-16 13:35:27 -07:00
executor [FIX] Simpler fix for async engine running on ray (#3371) 2024-03-13 14:18:40 -07:00
lora Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
model_executor Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) 2024-03-16 13:35:27 -07:00
spec_decode Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
transformers_utils Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
worker Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
__init__.py Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
logger.py Make vLLM logging formatting optional (#2877) 2024-02-20 14:38:55 -08:00
outputs.py [Fix] Fix best_of behavior when n=1 (#3298) 2024-03-10 19:17:46 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
sequence.py Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
test_utils.py Use CuPy for CUDA graphs (#2811) 2024-02-13 11:32:06 -08:00
utils.py [Misc] add HOST_IP env var (#3419) 2024-03-14 21:32:52 -07:00