vllm/vllm/model_executor
2024-01-15 15:43:59 -08:00
..
layers fix weigit loading for GQA with TP (#2379) 2024-01-15 15:43:59 -08:00
models Address Phi modeling update 2 (#2428) 2024-01-12 12:16:49 -08:00
parallel_utils Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
__init__.py Refactor Worker & InputMetadata (#1843) 2023-11-29 22:16:37 -08:00
input_metadata.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
model_loader.py Implement lazy model loader (#2044) 2023-12-12 22:21:45 -08:00
sampling_metadata.py Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
utils.py TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622) 2023-11-15 22:50:41 -08:00
weight_utils.py [Minor] Fix a typo in .pt weight support (#2160) 2023-12-17 10:12:44 -08:00