Woosuk Kwon
7c600440f7
Fix model docstrings ( #1764 )
2023-11-23 23:04:44 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff ( #1665 )
2023-11-20 11:58:01 -08:00
Woosuk Kwon
8d17774f92
Add AWQ support for all models ( #1714 )
2023-11-18 17:56:47 -08:00
Zhuohan Li
7076fa1c9f
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models ( #1622 )
...
Refactor the tensor parallelism, quantization, and weight-loading codes.
Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580 ).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.
2023-11-15 22:50:41 -08:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic ( #1181 )
2023-10-02 15:36:09 -07:00
Jasmond L
ab019eea75
Add Model Revision Support ( #1014 )
...
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-13 15:20:02 -07:00
Zhuohan Li
c957c741d9
Enable safetensors loading for all models ( #974 )
2023-09-07 15:49:52 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate ( #857 )
2023-09-04 17:29:42 -07:00
JFDuan
0d93f15694
Accelerate LLaMA model loading ( #234 )
2023-08-30 01:00:13 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter ( #326 )
2023-07-03 11:31:55 -07:00
twaka
4026a049d3
expand coverage of gpt2 model loading ( #271 )
2023-06-27 06:27:41 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00