vllm/entrypoints at d2b1bf55ec0d50f76762b902ca84036ac53e9646 - vllm

History

Joe Runde de4008e2ab [Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>		2024-10-17 22:47:27 -04:00
..
llm	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 )	2024-10-17 22:47:27 -04:00
offline_mode	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 )	2024-10-17 22:47:27 -04:00
openai	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 )	2024-10-15 15:40:43 -07:00
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
conftest.py	Support for guided decoding for offline LLM (#6878 )	2024-08-04 03:12:09 +00:00
test_chat_utils.py	[Frontend] Multimodal support in offline chat (#8098 )	2024-09-04 05:22:17 +00:00