| .. |
|
__init__.py
|
[Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798)
|
2024-04-02 12:35:31 -07:00 |
|
batch_expansion.py
|
[BugFix] Fix use of per-request seed with pipeline parallel (#6698)
|
2024-07-30 10:40:08 -07:00 |
|
draft_model_runner.py
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
interfaces.py
|
[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369)
|
2024-07-19 06:01:09 -07:00 |
|
medusa_worker.py
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
metrics.py
|
[Bugfix] [SpecDecode] AsyncMetricsCollector: update time since last collection (#6578)
|
2024-07-19 14:01:03 -07:00 |
|
mlp_speculator_worker.py
|
[BugFix] Fix use of per-request seed with pipeline parallel (#6698)
|
2024-07-30 10:40:08 -07:00 |
|
multi_step_worker.py
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
ngram_worker.py
|
[BugFix] Fix use of per-request seed with pipeline parallel (#6698)
|
2024-07-30 10:40:08 -07:00 |
|
proposer_worker_base.py
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
smaller_tp_proposer_worker.py
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
spec_decode_worker.py
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
target_model_runner.py
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
top1_proposer.py
|
[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369)
|
2024-07-19 06:01:09 -07:00 |
|
util.py
|
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963)
|
2024-08-05 08:46:44 +00:00 |