vllm/vllm
Robert Shaw 15985680e2
[ Misc ] Rs/compressed tensors cleanup (#5432)
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
2024-06-14 10:01:46 -07:00
..
attention Revert "[Core] Remove unnecessary copies in flash attn backend" (#5478) 2024-06-13 11:22:50 -07:00
core [Bugfix] Fix typo in scheduler.py (requeset -> request) (#5470) 2024-06-12 21:59:44 +00:00
distributed Add cuda_device_count_stateless (#5473) 2024-06-13 16:06:49 -07:00
engine [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
entrypoints [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
executor Add cuda_device_count_stateless (#5473) 2024-06-13 16:06:49 -07:00
logging [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
lora [Misc] Improve error message when LoRA parsing fails (#5194) 2024-06-10 19:38:49 +08:00
model_executor [ Misc ] Rs/compressed tensors cleanup (#5432) 2024-06-14 10:01:46 -07:00
multimodal [Bugfix] Fix LLaVA-NeXT (#5380) 2024-06-10 15:38:47 +00:00
spec_decode [Misc] Various simplifications and typing fixes (#5368) 2024-06-11 10:29:02 +08:00
transformers_utils [Frontend] Customizable RoPE theta (#5197) 2024-06-11 10:42:26 -07:00
usage [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
worker [Core][Distributed] code deduplication in tp&pp with coordinator(#5293) 2024-06-12 17:27:08 -07:00
__init__.py [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
_custom_ops.py [Kernel] Factor out epilogues from cutlass kernels (#5391) 2024-06-13 11:22:19 -07:00
block.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
config.py Add cuda_device_count_stateless (#5473) 2024-06-13 16:06:49 -07:00
envs.py [Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00
inputs.py [Bugfix] TYPE_CHECKING for MultiModalData (#5444) 2024-06-12 14:08:52 -07:00
logger.py [Misc] add logging level env var (#5045) 2024-05-24 23:49:49 -07:00
outputs.py [Core] Consolidate prompt arguments to LLM engines (#4328) 2024-05-28 13:29:31 -07:00
pooling_params.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985) 2024-05-23 22:04:24 +00:00
sequence.py [Core] Support image processor (#4197) 2024-06-02 22:56:41 -07:00
utils.py Add cuda_device_count_stateless (#5473) 2024-06-13 16:06:49 -07:00
version.py bump version to v0.5.0.post1 (#5522) 2024-06-13 19:42:06 -07:00