vllm/vllm at a19bc5c6281cb1d539043acb06699bf8438bb254 - vllm

History

Woosuk Kwon a19bc5c628 Automatically configure `max_num_batched_tokens` (#1198 )		2023-09-27 16:34:00 -07:00
..
core	Fix hanging when prompt exceeds limit (#1029 )	2023-09-17 01:48:56 -07:00
engine	Automatically configure `max_num_batched_tokens` (#1198 )	2023-09-27 16:34:00 -07:00
entrypoints	Align `max_tokens` behavior with openai (#852 )	2023-09-23 18:10:13 -07:00
model_executor	fix qwen-14b model (#1173 )	2023-09-27 16:33:16 -07:00
transformers_utils	fix qwen-14b model (#1173 )	2023-09-27 16:33:16 -07:00
worker	Allocate more shared memory to attention kernel (#1154 )	2023-09-26 22:27:13 -07:00
__init__.py	Bump up the version to v0.1.7 (#1013 )	2023-09-11 00:54:30 -07:00
block.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
config.py	Automatically configure `max_num_batched_tokens` (#1198 )	2023-09-27 16:34:00 -07:00
logger.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
outputs.py	Align vLLM's beam search implementation with HF generate (#857 )	2023-09-04 17:29:42 -07:00
sampling_params.py	[Sampler] Vectorized sampling (simplified) (#1048 )	2023-09-22 17:48:04 -07:00
sequence.py	Fix get_max_num_running_seqs for waiting and swapped seq groups (#1068 )	2023-09-18 11:49:40 -07:00
utils.py	Allocate more shared memory to attention kernel (#1154 )	2023-09-26 22:27:13 -07:00