vllm/vllm/worker
Terry 2a543d6efe
Add LoRA support for Mixtral (#2831)
* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell
2024-02-14 00:55:45 +01:00
..
spec_decode [Speculative decoding 2/9] Multi-step worker for draft model (#2424) 2024-01-21 16:31:47 -08:00
__init__.py Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
cache_engine.py Remove hardcoded device="cuda" to support more devices (#2503) 2024-02-01 15:46:39 -08:00
model_runner.py Add LoRA support for Mixtral (#2831) 2024-02-14 00:55:45 +01:00
worker.py Use CuPy for CUDA graphs (#2811) 2024-02-13 11:32:06 -08:00