vllm/vllm/spec_decode
2024-06-28 09:17:51 -07:00
..
__init__.py [Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ (#3798) 2024-04-02 12:35:31 -07:00
batch_expansion.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00
draft_model_runner.py [Spec Decode] Introduce DraftModelRunner (#5799) 2024-06-28 09:17:51 -07:00
interfaces.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00
metrics.py [Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951) 2024-04-23 08:02:36 +00:00
mlp_speculator_worker.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
multi_step_worker.py [Spec Decode] Introduce DraftModelRunner (#5799) 2024-06-28 09:17:51 -07:00
ngram_worker.py [mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
proposer_worker_base.py [Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) 2024-06-25 09:56:06 +00:00
smaller_tp_proposer_worker.py [Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) 2024-06-25 09:56:06 +00:00
spec_decode_worker.py [Spec Decode] Introduce DraftModelRunner (#5799) 2024-06-28 09:17:51 -07:00
top1_proposer.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00
util.py [Model] MLPSpeculator speculative decoding support (#4947) 2024-06-20 20:23:12 -04:00