This website requires JavaScript.
Explore
Help
Register
Sign In
squall
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
1
Packages
Projects
Releases
Wiki
Activity
8cd5a992bf
vllm
/
vllm
/
model_executor
History
Chenhui Zhang
f780504d12
fix weigit loading for GQA with TP (
#2379
)
2024-01-15 15:43:59 -08:00
..
layers
fix weigit loading for GQA with TP (
#2379
)
2024-01-15 15:43:59 -08:00
models
Address Phi modeling update 2 (
#2428
)
2024-01-12 12:16:49 -08:00
parallel_utils
Use NCCL instead of ray for control-plane communication to remove serialization overhead (
#2221
)
2024-01-03 11:30:22 -08:00
__init__.py
Refactor Worker & InputMetadata (
#1843
)
2023-11-29 22:16:37 -08:00
input_metadata.py
Use NCCL instead of ray for control-plane communication to remove serialization overhead (
#2221
)
2024-01-03 11:30:22 -08:00
model_loader.py
Implement lazy model loader (
#2044
)
2023-12-12 22:21:45 -08:00
sampling_metadata.py
Use NCCL instead of ray for control-plane communication to remove serialization overhead (
#2221
)
2024-01-03 11:30:22 -08:00
utils.py
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (
#1622
)
2023-11-15 22:50:41 -08:00
weight_utils.py
[Minor] Fix a typo in .pt weight support (
#2160
)
2023-12-17 10:12:44 -08:00