vllm/vllm/spec_decode
2024-07-10 16:02:47 -07:00
..
__init__.py [Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798) 2024-04-02 12:35:31 -07:00
batch_expansion.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00
draft_model_runner.py [CORE] Adding support for insertion of soft-tuned prompts (#4645) 2024-07-09 13:26:36 -07:00
interfaces.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
medusa_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
metrics.py [Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348) 2024-07-01 00:33:05 -07:00
mlp_speculator_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
multi_step_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
ngram_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
proposer_worker_base.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
smaller_tp_proposer_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
spec_decode_worker.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
top1_proposer.py [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) 2024-07-10 16:02:47 -07:00
util.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00