This website requires JavaScript.
Explore
Help
Register
Sign In
squall
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
1
Packages
Projects
Releases
Wiki
Activity
cfba4def5d
vllm
/
vllm
/
lora
History
William Lin
57b7be0e1c
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (
#6971
)
2024-08-09 05:42:45 +00:00
..
ops
[Kernel][RFC] Refactor the punica kernel based on Triton (
#5036
)
2024-07-31 17:12:24 -07:00
__init__.py
[Experimental] Add multi-LoRA support (
#1804
)
2024-01-23 15:26:37 -08:00
fully_sharded_layers.py
[Kernel][RFC] Refactor the punica kernel based on Triton (
#5036
)
2024-07-31 17:12:24 -07:00
layers.py
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (
#6971
)
2024-08-09 05:42:45 +00:00
lora.py
[Model] Add base class for LoRA-supported models (
#5018
)
2024-06-27 16:03:04 +08:00
models.py
[Bugfix] Fix LoRA with PP (
#7292
)
2024-08-08 00:02:27 -07:00
punica.py
[Kernel][RFC] Refactor the punica kernel based on Triton (
#5036
)
2024-07-31 17:12:24 -07:00
request.py
[Core] Support dynamically loading Lora adapter from HuggingFace (
#6234
)
2024-07-22 15:42:40 -07:00
utils.py
[LoRA] ReplicatedLinear support LoRA (
#7081
)
2024-08-02 22:40:19 -07:00
worker_manager.py
[Core] Support dynamically loading Lora adapter from HuggingFace (
#6234
)
2024-07-22 15:42:40 -07:00