vllm/worker at 3921a2f29e30df293459d824e20d2e546e4af0c7 - vllm

History

Joe Runde de4008e2ab [Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>		2024-10-17 22:47:27 -04:00
..
__init__.py	[Speculative decoding 2/9] Multi-step worker for draft model (#2424 )	2024-01-21 16:31:47 -08:00
test_encoder_decoder_model_runner.py	[Core] Factor out common code in `SequenceData` and `Sequence` (#8675 )	2024-09-21 02:30:39 +00:00
test_model_input.py	[Core] Add `AttentionState` abstraction (#7663 )	2024-08-20 18:50:45 +00:00
test_model_runner.py	[Core] Factor out common code in `SequenceData` and `Sequence` (#8675 )	2024-09-21 02:30:39 +00:00
test_profile.py	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 )	2024-10-17 22:47:27 -04:00
test_swap.py	[Core] Pipeline Parallel Support (#4412 )	2024-07-02 10:58:08 -07:00