..
async_engine
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
basic_correctness
[Core] Fix circular reference which leaked llm instance in local dev env ( #4737 )
2024-05-10 23:54:32 +09:00
core
[Core][Optimization] change python dict to pytorch tensor for blocks to swap ( #4659 )
2024-05-08 12:07:05 -07:00
distributed
[Core][Distributed] refactor pynccl ( #4591 )
2024-05-09 19:48:43 -07:00
engine
[Core] Add multiproc_worker_utils for multiprocessing-based workers ( #4357 )
2024-05-01 18:41:59 +00:00
entrypoints
[Frontend] Move async logic outside of constructor ( #4674 )
2024-05-08 22:48:33 -07:00
fp8_kv
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
kernels
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support ( #4535 )
2024-05-09 18:04:17 -06:00
lora
[Kernel] Full Tensor Parallelism for LoRA Layers ( #3524 )
2024-04-27 00:03:48 -07:00
metrics
[Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics ( #3937 )
2024-05-04 15:39:34 -07:00
model_executor
[Core] Support offline use of local cache for models ( #4374 )
2024-04-27 09:59:55 -07:00
models
[CI] Make mistral tests pass ( #4596 )
2024-05-08 08:44:35 -07:00
prefix_caching
[Core][Bugfix]Refactor block manager for better testability ( #3492 )
2024-03-27 23:59:28 -07:00
prompts
[BugFix] Fix input positions for long context with sliding window ( #2088 )
2023-12-13 12:28:13 -08:00
quantization
[Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin ( #3922 )
2024-04-29 09:35:34 -07:00
samplers
[Misc] Remove unnecessary ModelRunner imports ( #4703 )
2024-05-09 00:17:17 -07:00
spec_decode
[Dynamic Spec Decoding] Auto-disable by the running queue size ( #4592 )
2024-05-08 21:44:00 +00:00
tensorizer_loader
[Core][Distributed] use cpu group to broadcast metadata in cpu ( #4444 )
2024-04-29 13:52:22 -07:00
tokenization
[Bugfix] Fix parameter name in get_tokenizer ( #4107 )
2024-04-25 19:10:48 -07:00
worker
[Misc] Set block size at initialization & Fix test_model_runner ( #4705 )
2024-05-09 09:04:59 -07:00
__init__.py
[Small] Formatter only checks lints in changed files ( #1528 )
2023-10-31 15:39:38 -07:00
conftest.py
[CI] Make mistral tests pass ( #4596 )
2024-05-08 08:44:35 -07:00
test_cache_block_hashing.py
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
test_config.py
[Core] Refactor model loading code ( #4097 )
2024-04-16 11:34:39 -07:00
test_logger.py
[MISC] Rework logger to enable pythonic custom logging configuration to be provided ( #4273 )
2024-05-01 17:34:40 -07:00
test_logits_processor.py
[Misc] Remove unnecessary ModelRunner imports ( #4703 )
2024-05-09 00:17:17 -07:00
test_regression.py
[BugFix] Fix GC bug for LLM class ( #2882 )
2024-02-14 22:17:44 -08:00
test_sampling_params.py
[Bugfix] fix crash if max_tokens=None ( #2570 )
2024-01-23 22:38:55 -08:00
test_sequence.py
[Misc] Keep only one implementation of the create_dummy_prompt function. ( #4716 )
2024-05-09 21:42:38 -07:00