This website requires JavaScript.
Explore
Help
Register
Sign In
squall
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
1
Packages
Projects
Releases
Wiki
Activity
5253edaacb
vllm
/
vllm
/
model_executor
History
Xiang Xu
5253edaacb
Add Gemma model (
#2964
)
2024-02-21 09:34:30 -08:00
..
layers
Prefix Caching- fix t4 triton error (
#2517
)
2024-02-16 14:17:55 -08:00
models
Add Gemma model (
#2964
)
2024-02-21 09:34:30 -08:00
parallel_utils
Don't use cupy NCCL for AMD backends (
#2855
)
2024-02-14 12:30:44 -08:00
__init__.py
Refactor Worker & InputMetadata (
#1843
)
2023-11-29 22:16:37 -08:00
input_metadata.py
Support FP8-E5M2 KV Cache (
#2279
)
2024-01-28 16:43:54 -08:00
model_loader.py
Add LoRA support for Mixtral (
#2831
)
2024-02-14 00:55:45 +01:00
sampling_metadata.py
Use NCCL instead of ray for control-plane communication to remove serialization overhead (
#2221
)
2024-01-03 11:30:22 -08:00
utils.py
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (
#1622
)
2023-11-15 22:50:41 -08:00
weight_utils.py
Use revision when downloading the quantization config file (
#2697
)
2024-02-01 15:41:58 -08:00