vllm/worker at de4008e2abc50b8a5d72d7ba553037f03cf97caa - vllm

History

Joe Runde de4008e2ab [Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>		2024-10-17 22:47:27 -04:00
..
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
cache_engine.py	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
cpu_enc_dec_model_runner.py	[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089 )	2024-10-07 06:50:35 +00:00
cpu_model_runner.py	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
cpu_worker.py	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
embedding_model_runner.py	[Model] PP support for embedding models and update docs (#9090 )	2024-10-06 16:35:27 +08:00
enc_dec_model_runner.py	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
model_runner_base.py	[MISC] Skip dumping inputs when unpicklable (#8744 )	2024-09-24 06:10:03 +00:00
model_runner.py	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 )	2024-10-17 11:38:15 -05:00
multi_step_model_runner.py	[Doc] Compatibility matrix for mutual exclusive features (#8512 )	2024-10-11 11:18:50 -07:00
multi_step_tpu_worker.py	[TPU] Implement multi-step scheduling (#8489 )	2024-09-14 16:58:31 -07:00
multi_step_worker.py	[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378 )	2024-09-27 13:32:07 -07:00
neuron_model_runner.py	[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131 )	2024-10-08 14:12:56 +00:00
neuron_worker.py	[Bugfix] neuron: enable tensor parallelism (#7562 )	2024-08-26 15:13:13 -07:00
openvino_model_runner.py	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
openvino_worker.py	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
tpu_model_runner.py	[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350 )	2024-10-14 15:02:06 -07:00
tpu_worker.py	[torch.compile] use empty tensor instead of None for profiling (#8875 )	2024-09-27 08:11:32 -07:00
utils.py	[Doc] Compatibility matrix for mutual exclusive features (#8512 )	2024-10-11 11:18:50 -07:00
worker_base.py	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
worker.py	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 )	2024-10-17 22:47:27 -04:00
xpu_model_runner.py	[Hardware][intel GPU] add async output process for xpu (#8897 )	2024-10-14 12:23:33 -06:00
xpu_worker.py	[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810 )	2024-08-27 10:07:02 -07:00