..
async_engine
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py ( #9510 )
2024-10-18 14:30:55 -07:00
basic_correctness
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
2024-10-28 04:07:00 +00:00
compile
[torch.compile] support moe models ( #9632 )
2024-10-27 21:58:04 -07:00
core
[core] simplify seq group code ( #9569 )
2024-10-24 00:16:44 -07:00
data
[Bugfix] Fix load config when using bools ( #9533 )
2024-10-27 13:46:41 -04:00
distributed
[torch.compile] Adding torch compile annotations to some models ( #9641 )
2024-10-24 09:31:42 -07:00
encoder_decoder
[Hardware][CPU] using current_platform.is_cpu ( #9536 )
2024-10-22 00:50:43 -07:00
engine
[Frontend] [Neuron] Parse literals out of override-neuron-config ( #8959 )
2024-10-03 18:02:07 +00:00
entrypoints
[Bugfix][Frontend] Guard against bad token ids ( #9634 )
2024-10-29 14:13:20 -07:00
fp8_kv
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
kernels
[Hardware] using current_platform.seed_everything ( #9785 )
2024-10-29 14:47:44 +00:00
lora
[Hardware] using current_platform.seed_everything ( #9785 )
2024-10-29 14:47:44 +00:00
metrics
[BugFix] Fix metrics error for --num-scheduler-steps > 1 ( #8234 )
2024-10-22 15:43:03 -07:00
model_executor
[torch.compile] Fine-grained CustomOp enabling mechanism ( #9300 )
2024-10-17 18:36:37 +00:00
models
[CI][Bugfix] Skip chameleon for transformers 4.46.1 ( #9808 )
2024-10-29 11:12:43 -07:00
mq_llm_engine
[Frontend] Don't log duplicate error stacktrace for every request in the batch ( #9023 )
2024-10-21 14:49:41 -07:00
multi_step
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
2024-10-17 11:38:15 -05:00
multimodal
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
plugins /vllm_add_dummy_model
[Model] VLM2Vec, the first multimodal embedding model in vLLM ( #9303 )
2024-10-16 14:31:00 +08:00
prefix_caching
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py ( #9510 )
2024-10-18 14:30:55 -07:00
prompt_adapter
[CORE] Adding support for insertion of soft-tuned prompts ( #4645 )
2024-07-09 13:26:36 -07:00
prompts
[BugFix] Fix input positions for long context with sliding window ( #2088 )
2023-12-13 12:28:13 -08:00
quantization
🐛 fix torch memory profiling ( #9516 )
2024-10-18 21:25:19 -04:00
samplers
[Frontend] Bad words sampling parameter ( #9717 )
2024-10-26 16:29:38 +00:00
spec_decode
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
2024-10-28 04:07:00 +00:00
tensorizer_loader
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py ( #9510 )
2024-10-18 14:30:55 -07:00
tokenization
[Core] Allow specifying custom Executor ( #6557 )
2024-07-20 01:25:06 +00:00
tool_use
[Model] tool calling support for ibm-granite/granite-20b-functioncalling ( #8339 )
2024-10-29 15:07:37 -07:00
tpu
[torch.compile] integration with compilation control ( #9058 )
2024-10-10 12:39:36 -07:00
tracing
[BugFix] Prevent exporting duplicate OpenTelemetry spans ( #9017 )
2024-10-22 11:11:53 -07:00
weight_loading
[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models ( #9213 )
2024-10-10 14:15:40 +08:00
worker
[Hardware][CPU] using current_platform.is_cpu ( #9536 )
2024-10-22 00:50:43 -07:00
__init__.py
[Small] Formatter only checks lints in changed files ( #1528 )
2023-10-31 15:39:38 -07:00
conftest.py
[Model] Add classification Task with Qwen2ForSequenceClassification ( #9704 )
2024-10-26 17:53:35 +00:00
test_cache_block_hashing.py
[CI/Build] Update Ruff version ( #8469 )
2024-09-18 11:00:56 +00:00
test_config.py
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
test_embedded_commit.py
[CI/Build] use setuptools-scm to set __version__ ( #4738 )
2024-09-23 09:44:26 -07:00
test_inputs.py
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs ( #9131 )
2024-10-08 14:12:56 +00:00
test_logger.py
[CI/Build] Update Ruff version ( #8469 )
2024-09-18 11:00:56 +00:00
test_logits_processor.py
[Core] Factor out common code in SequenceData and Sequence ( #8675 )
2024-09-21 02:30:39 +00:00
test_regression.py
Bugfix: fix broken of download models from modelscope ( #5233 )
2024-06-06 09:28:10 -07:00
test_sampling_params.py
[Bugfix] fix crash if max_tokens=None ( #2570 )
2024-01-23 22:38:55 -08:00
test_scalartype.py
[Bugfix] Fix support for dimension like integers and ScalarType ( #9299 )
2024-10-17 19:08:34 +00:00
test_sequence.py
[Core] Factor out common code in SequenceData and Sequence ( #8675 )
2024-09-21 02:30:39 +00:00
test_sharded_state_loader.py
[CI/Build] Replaced some models on tests for smaller ones ( #9570 )
2024-10-22 04:52:14 +00:00
test_utils.py
[Bugfix] Fix load config when using bools ( #9533 )
2024-10-27 13:46:41 -04:00
utils.py
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
2024-10-28 04:07:00 +00:00