vllm/vllm
Woosuk Kwon e3e79e9e8a
Implement AWQ quantization support for LLaMA (#1032)
Co-authored-by: Robert Irvine <robert@seamlessml.com>
Co-authored-by: root <rirv938@gmail.com>
Co-authored-by: Casper <casperbh.96@gmail.com>
Co-authored-by: julian-q <julianhquevedo@gmail.com>
2023-09-16 00:03:37 -07:00
..
core Make AsyncLLMEngine more robust & fix batched abort (#969) 2023-09-07 13:43:45 -07:00
engine Implement AWQ quantization support for LLaMA (#1032) 2023-09-16 00:03:37 -07:00
entrypoints Only fail if logit_bias has actual values (#1045) 2023-09-14 17:33:01 -07:00
model_executor Implement AWQ quantization support for LLaMA (#1032) 2023-09-16 00:03:37 -07:00
transformers_utils Fix warning message on LLaMA FastTokenizer (#1037) 2023-09-14 17:33:32 -07:00
worker Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
__init__.py Bump up the version to v0.1.7 (#1013) 2023-09-11 00:54:30 -07:00
block.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
config.py Implement AWQ quantization support for LLaMA (#1032) 2023-09-16 00:03:37 -07:00
logger.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
outputs.py Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
sampling_params.py Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
sequence.py [FIX] Minor bug fixes (#1035) 2023-09-13 16:38:12 -07:00
utils.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00