vllm/vllm at 253a98078a21a014c263bea9f99ae9234a263670 - vllm

History

youkaichao b522c4476f [Misc] add HOST_IP env var (#3419 ) Co-authored-by: Simon Mo <simon.mo@hey.com>		2024-03-14 21:32:52 -07:00
..
core	Fixes #1556 double free (#3347 )	2024-03-13 00:30:08 +00:00
engine	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
entrypoints	Add args for mTLS support (#3410 )	2024-03-14 13:11:45 -07:00
executor	[FIX] Simpler fix for async engine running on ray (#3371 )	2024-03-13 14:18:40 -07:00
lora	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
model_executor	fix marlin config repr (#3414 )	2024-03-14 16:26:19 -07:00
spec_decode	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
transformers_utils	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
worker	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
__init__.py	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373 )	2024-03-14 13:56:57 -07:00
logger.py	Make vLLM logging formatting optional (#2877 )	2024-02-20 14:38:55 -08:00
outputs.py	[Fix] Fix best_of behavior when n=1 (#3298 )	2024-03-10 19:17:46 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
sequence.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
test_utils.py	Use CuPy for CUDA graphs (#2811 )	2024-02-13 11:32:06 -08:00
utils.py	[Misc] add HOST_IP env var (#3419 )	2024-03-14 21:32:52 -07:00