vllm/model_executor at c5f7740d89737744438e08c26da1d4fbadcb3893 - vllm

History

ljss e1054247ba [Optimization] Implement fused add rmsnorm (#1667 )		2023-11-18 18:18:02 -08:00
..
layers	[Optimization] Implement fused add rmsnorm (#1667 )	2023-11-18 18:18:02 -08:00
models	[Optimization] Implement fused add rmsnorm (#1667 )	2023-11-18 18:18:02 -08:00
parallel_utils	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )	2023-11-15 22:50:41 -08:00
__init__.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
input_metadata.py	Delay GPU->CPU sync in sampling (#1337 )	2023-10-30 09:01:34 -07:00
model_loader.py	Use `quantization_config` in hf config (#1695 )	2023-11-17 16:23:49 -08:00
utils.py	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )	2023-11-15 22:50:41 -08:00
weight_utils.py	use get_tensor in safe_open (#1696 )	2023-11-18 16:45:18 -08:00