vllm/vllm
Woosuk Kwon e67b4f2c2a
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
2023-09-11 00:26:35 -07:00
..
core Make AsyncLLMEngine more robust & fix batched abort (#969) 2023-09-07 13:43:45 -07:00
engine fix "tansformers_module" ModuleNotFoundError when load model with trust_remote_code=True (#871) 2023-09-08 17:21:30 -07:00
entrypoints Start background task in AsyncLLMEngine.generate (#988) 2023-09-08 00:03:39 -07:00
model_executor Use FP32 in RoPE initialization (#1004) 2023-09-11 00:26:35 -07:00
transformers_utils Only emit warning about internal tokenizer if it isn't being used (#939) 2023-09-05 00:50:55 +09:00
worker Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
__init__.py Bump up the version to v0.1.6 (#989) 2023-09-08 00:07:46 -07:00
block.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
config.py fix: CUDA error when inferencing with Falcon-40B base model (#992) 2023-09-10 01:39:02 -07:00
logger.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
outputs.py Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
sampling_params.py Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
sequence.py Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
utils.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00