vllm/vllm at ac8d36f3e5776e020eb32a08eb3a2f9c60a49344 - vllm

History

Antoni Baum 15f5632365 Delay GPU->CPU sync in sampling (#1337 )		2023-10-30 09:01:34 -07:00
..
core	Fix type hints (#1427 )	2023-10-20 08:50:47 -07:00
engine	Support SqueezeLLM (#1326 )	2023-10-21 23:14:59 -07:00
entrypoints	API server support ipv4 / ipv6 dualstack (#1288 )	2023-10-07 15:15:54 -07:00
model_executor	Delay GPU->CPU sync in sampling (#1337 )	2023-10-30 09:01:34 -07:00
transformers_utils	fix: don't skip first special token. (#1497 )	2023-10-29 04:26:36 -07:00
worker	Delay GPU->CPU sync in sampling (#1337 )	2023-10-30 09:01:34 -07:00
__init__.py	Bump up the version to v0.2.1 (#1355 )	2023-10-16 12:58:57 -07:00
block.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
config.py	Support SqueezeLLM (#1326 )	2023-10-21 23:14:59 -07:00
logger.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
outputs.py	Implement prompt logprobs & Batched topk for computing logprobs (#1328 )	2023-10-16 10:56:50 -07:00
sampling_params.py	Support repetition_penalty (#1424 )	2023-10-29 10:02:41 -07:00
sequence.py	[BugFix] Define `__eq__` in SequenceGroupOutputs (#1389 )	2023-10-17 01:09:44 -07:00
utils.py	Allocate more shared memory to attention kernel (#1154 )	2023-09-26 22:27:13 -07:00