vllm/tests
SangBin Cho 6a0f617210
[Core] Fix circular reference which leaked llm instance in local dev env (#4737)
Storing exception frame is extremely prone to circular refernece because it contains the reference to objects.

When tensorizer is not installed, it leaks llm instance because error frame has references to various modules which cause circular reference problem.

I also found spec decoding has a circular reference issue, and I solved it using weakref.proxy.
2024-05-10 23:54:32 +09:00
..
async_engine [Frontend] Move async logic outside of constructor (#4674) 2024-05-08 22:48:33 -07:00
basic_correctness [Core] Fix circular reference which leaked llm instance in local dev env (#4737) 2024-05-10 23:54:32 +09:00
core [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) 2024-05-08 12:07:05 -07:00
distributed [Core][Distributed] refactor pynccl (#4591) 2024-05-09 19:48:43 -07:00
engine [Core] Add multiproc_worker_utils for multiprocessing-based workers (#4357) 2024-05-01 18:41:59 +00:00
entrypoints [Frontend] Move async logic outside of constructor (#4674) 2024-05-08 22:48:33 -07:00
fp8_kv Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290) 2024-04-03 14:15:55 -07:00
kernels [Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535) 2024-05-09 18:04:17 -06:00
lora [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) 2024-04-27 00:03:48 -07:00
metrics [Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics (#3937) 2024-05-04 15:39:34 -07:00
model_executor [Core] Support offline use of local cache for models (#4374) 2024-04-27 09:59:55 -07:00
models [CI] Make mistral tests pass (#4596) 2024-05-08 08:44:35 -07:00
prefix_caching [Core][Bugfix]Refactor block manager for better testability (#3492) 2024-03-27 23:59:28 -07:00
prompts [BugFix] Fix input positions for long context with sliding window (#2088) 2023-12-13 12:28:13 -08:00
quantization [Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922) 2024-04-29 09:35:34 -07:00
samplers [Misc] Remove unnecessary ModelRunner imports (#4703) 2024-05-09 00:17:17 -07:00
spec_decode [Dynamic Spec Decoding] Auto-disable by the running queue size (#4592) 2024-05-08 21:44:00 +00:00
tensorizer_loader [Core][Distributed] use cpu group to broadcast metadata in cpu (#4444) 2024-04-29 13:52:22 -07:00
tokenization [Bugfix] Fix parameter name in get_tokenizer (#4107) 2024-04-25 19:10:48 -07:00
worker [Misc] Set block size at initialization & Fix test_model_runner (#4705) 2024-05-09 09:04:59 -07:00
__init__.py [Small] Formatter only checks lints in changed files (#1528) 2023-10-31 15:39:38 -07:00
conftest.py [CI] Make mistral tests pass (#4596) 2024-05-08 08:44:35 -07:00
test_cache_block_hashing.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_config.py [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
test_logger.py [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
test_logits_processor.py [Misc] Remove unnecessary ModelRunner imports (#4703) 2024-05-09 00:17:17 -07:00
test_regression.py [BugFix] Fix GC bug for LLM class (#2882) 2024-02-14 22:17:44 -08:00
test_sampling_params.py [Bugfix] fix crash if max_tokens=None (#2570) 2024-01-23 22:38:55 -08:00
test_sequence.py [Misc] Keep only one implementation of the create_dummy_prompt function. (#4716) 2024-05-09 21:42:38 -07:00