vllm/tests/spec_decode/e2e
2024-08-19 17:58:14 -07:00
..
__init__.py [Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951) 2024-04-23 08:02:36 +00:00
conftest.py [Speculative Decoding] Fixing hidden states handling in batch expansion (#7508) 2024-08-19 17:58:14 -07:00
test_compatibility.py [Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840) 2024-05-16 00:53:51 -07:00
test_integration_dist_tp2.py [Model] RowParallelLinear: pass bias to quant_method.apply (#6327) 2024-07-19 07:15:22 -06:00
test_integration_dist_tp4.py [BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369) 2024-07-19 06:01:09 -07:00
test_integration.py [Misc] Add quantization config support for speculative model. (#7343) 2024-08-15 19:34:28 -07:00
test_logprobs.py [Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (#6485) 2024-07-20 23:58:58 -07:00
test_medusa_correctness.py [Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978) 2024-07-09 18:34:02 -07:00
test_mlp_correctness.py [Speculative Decoding] Fixing hidden states handling in batch expansion (#7508) 2024-08-19 17:58:14 -07:00
test_multistep_correctness.py [Misc] Log spec decode metrics (#6454) 2024-07-16 20:37:10 +00:00
test_ngram_correctness.py [Dynamic Spec Decoding] Minor fix for disabling speculative decoding (#5000) 2024-05-25 10:00:14 -07:00
test_seed.py [BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00