| .. |
|
attention
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
|
2024-06-25 20:30:03 -07:00 |
|
core
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
distributed
|
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801)
|
2024-06-25 21:56:02 -07:00 |
|
engine
|
[Core] Add fault tolerance for RayTokenizerGroupPool (#5748)
|
2024-06-25 10:15:10 -07:00 |
|
entrypoints
|
[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756)
|
2024-06-22 03:33:12 +00:00 |
|
executor
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
|
2024-06-25 20:30:03 -07:00 |
|
logging
|
[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273)
|
2024-05-01 17:34:40 -07:00 |
|
lora
|
[LoRA] Add support for pinning lora adapters in the LRU cache (#5603)
|
2024-06-21 15:42:46 -07:00 |
|
model_executor
|
[Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794)
|
2024-06-25 19:23:35 +00:00 |
|
multimodal
|
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832)
|
2024-06-25 20:34:25 -07:00 |
|
spec_decode
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
|
2024-06-25 20:30:03 -07:00 |
|
transformers_utils
|
[Core] Add fault tolerance for RayTokenizerGroupPool (#5748)
|
2024-06-25 10:15:10 -07:00 |
|
usage
|
[Misc] Add vLLM version getter to utils (#5098)
|
2024-06-13 11:21:39 -07:00 |
|
worker
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
|
2024-06-25 20:30:03 -07:00 |
|
__init__.py
|
[Misc] Add vLLM version getter to utils (#5098)
|
2024-06-13 11:21:39 -07:00 |
|
_custom_ops.py
|
[Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels (#5715)
|
2024-06-20 18:36:10 +00:00 |
|
_ipex_ops.py
|
[Kernel][CPU] Add Quick gelu to CPU (#5717)
|
2024-06-21 06:39:40 +00:00 |
|
block.py
|
[misc][typo] fix typo (#5620)
|
2024-06-17 20:54:57 -07:00 |
|
config.py
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
envs.py
|
[Core][Distributed] add shm broadcast (#5399)
|
2024-06-21 05:12:35 +00:00 |
|
inputs.py
|
[Bugfix] TYPE_CHECKING for MultiModalData (#5444)
|
2024-06-12 14:08:52 -07:00 |
|
logger.py
|
[Misc] add logging level env var (#5045)
|
2024-05-24 23:49:49 -07:00 |
|
outputs.py
|
[Core] Consolidate prompt arguments to LLM engines (#4328)
|
2024-05-28 13:29:31 -07:00 |
|
pooling_params.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
[Core]: Option To Use Prompt Token Ids Inside Logits Processor (#4985)
|
2024-05-23 22:04:24 +00:00 |
|
sequence.py
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
|
2024-06-25 20:30:03 -07:00 |
|
tracing.py
|
[Misc] Add OpenTelemetry support (#4687)
|
2024-06-19 01:17:03 +09:00 |
|
utils.py
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
version.py
|
bump version to v0.5.0.post1 (#5522)
|
2024-06-13 19:42:06 -07:00 |