..
async_engine
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00
basic_correctness
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
2024-06-08 08:59:20 +00:00
core
[CI] Upgrade codespell version. ( #5381 )
2024-06-12 10:06:14 -07:00
distributed
[Core][Distributed] add same-node detection ( #5369 )
2024-06-11 10:53:59 -07:00
engine
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
2024-06-08 08:59:20 +00:00
entrypoints
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00
fp8_kv
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
kernels
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops ( #5047 )
2024-06-09 16:23:30 -04:00
lora
[Misc] Improve error message when LoRA parsing fails ( #5194 )
2024-06-10 19:38:49 +08:00
metrics
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
2024-06-08 08:59:20 +00:00
model_executor
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
2024-05-13 23:50:09 +09:00
models
Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" ( #5463 )
2024-06-12 10:03:24 -07:00
multimodal
[Model] Initial support for LLaVA-NeXT ( #4199 )
2024-06-10 12:47:15 +00:00
prefix_caching
[Bugfix / Core] Prefix Caching Guards (merged with main) ( #4846 )
2024-05-27 15:18:17 -07:00
prompts
[BugFix] Fix input positions for long context with sliding window ( #2088 )
2023-12-13 12:28:13 -08:00
quantization
[Kernel] Vectorized FP8 quantize kernel ( #5396 )
2024-06-12 14:07:26 -07:00
samplers
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
2024-06-08 08:59:20 +00:00
spec_decode
[Core][Doc] Default to multiprocessing for single-node distributed case ( #5230 )
2024-06-11 11:10:41 -07:00
tensorizer_loader
[Bugfix][Frontend] Cleanup "fix chat logprobs" ( #5026 )
2024-06-10 22:36:46 -07:00
tokenization
[Core] Support image processor ( #4197 )
2024-06-02 22:56:41 -07:00
worker
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API ( #4681 )
2024-05-15 14:00:10 +09:00
__init__.py
[Small] Formatter only checks lints in changed files ( #1528 )
2023-10-31 15:39:38 -07:00
conftest.py
[CI/Test] improve robustness of test (vllm_runner) ( #5357 )
2024-06-08 08:59:20 +00:00
test_cache_block_hashing.py
[Core] Avoid the need to pass None values to Sequence.inputs ( #5099 )
2024-05-29 16:05:01 -07:00
test_config.py
[Frontend] Customizable RoPE theta ( #5197 )
2024-06-11 10:42:26 -07:00
test_inputs.py
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
2024-05-28 13:29:31 -07:00
test_logger.py
[MISC] Rework logger to enable pythonic custom logging configuration to be provided ( #4273 )
2024-05-01 17:34:40 -07:00
test_logits_processor.py
[Misc] Remove unnecessary ModelRunner imports ( #4703 )
2024-05-09 00:17:17 -07:00
test_regression.py
Bugfix: fix broken of download models from modelscope ( #5233 )
2024-06-06 09:28:10 -07:00
test_sampling_params.py
[Bugfix] fix crash if max_tokens=None ( #2570 )
2024-01-23 22:38:55 -08:00
test_sequence.py
[CI/Build] Move test_utils.py to tests/utils.py ( #4425 )
2024-05-13 23:50:09 +09:00
test_sharded_state_loader.py
[CI] Upgrade codespell version. ( #5381 )
2024-06-12 10:06:14 -07:00
test_utils.py
[Misc][Utils] allow get_open_port to be called for multiple times ( #5333 )
2024-06-06 22:15:11 -07:00
utils.py
[FRONTEND] OpenAI tools support named functions ( #5032 )
2024-06-03 18:25:29 -05:00