vllm/spec_decode at f3ff63c3f45974986f13f60647a258b09913c420 - vllm

History

Allen.Dou 40468b13fa [Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686 )		2024-07-24 08:58:42 -07:00
..
__init__.py	[Bugfix] Add `__init__.py` files for `vllm/core/block/` and `vllm/spec_decode/` (#3798 )	2024-04-02 12:35:31 -07:00
batch_expansion.py	[Bugfix] Make spec. decode respect per-request seed. (#6034 )	2024-07-18 19:22:08 -07:00
draft_model_runner.py	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
interfaces.py	[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369 )	2024-07-19 06:01:09 -07:00
medusa_worker.py	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
metrics.py	[Bugfix] [SpecDecode] AsyncMetricsCollector: update time since last collection (#6578 )	2024-07-19 14:01:03 -07:00
mlp_speculator_worker.py	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
multi_step_worker.py	[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step (#6338 )	2024-07-17 14:30:28 -07:00
ngram_worker.py	[Misc][Speculative decoding] Typos and typing fixes (#6467 )	2024-07-17 07:17:07 +00:00
proposer_worker_base.py	[Misc][Speculative decoding] Typos and typing fixes (#6467 )	2024-07-17 07:17:07 +00:00
smaller_tp_proposer_worker.py	[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765 )	2024-07-10 16:02:47 -07:00
spec_decode_worker.py	[Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686 )	2024-07-24 08:58:42 -07:00
target_model_runner.py	[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485 )	2024-07-20 23:58:58 -07:00
top1_proposer.py	[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369 )	2024-07-19 06:01:09 -07:00
util.py	[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485 )	2024-07-20 23:58:58 -07:00