This website requires JavaScript.
Explore
Help
Register
Sign In
squall
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
1
Packages
Projects
Releases
Wiki
Activity
fe6d09ae61
vllm
/
vllm
/
model_executor
History
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE (
#2769
)
2024-02-05 17:38:02 -08:00
..
layers
Add fused top-K softmax kernel for MoE (
#2769
)
2024-02-05 17:38:02 -08:00
models
Add fused top-K softmax kernel for MoE (
#2769
)
2024-02-05 17:38:02 -08:00
parallel_utils
[Minor] Fix false warning when TP=1 (
#2674
)
2024-01-30 14:39:40 -08:00
__init__.py
Refactor Worker & InputMetadata (
#1843
)
2023-11-29 22:16:37 -08:00
input_metadata.py
Support FP8-E5M2 KV Cache (
#2279
)
2024-01-28 16:43:54 -08:00
model_loader.py
Remove hardcoded
device="cuda"
to support more devices (
#2503
)
2024-02-01 15:46:39 -08:00
sampling_metadata.py
Use NCCL instead of ray for control-plane communication to remove serialization overhead (
#2221
)
2024-01-03 11:30:22 -08:00
utils.py
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (
#1622
)
2023-11-15 22:50:41 -08:00
weight_utils.py
Use revision when downloading the quantization config file (
#2697
)
2024-02-01 15:41:58 -08:00