vllm/e2e at a5314e8698b7c0f20cf3facf921c54917c89a9ba - vllm

History

Thomas Parnell a5314e8698 [Model] RowParallelLinear: pass bias to quant_method.apply (#6327 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>		2024-07-19 07:15:22 -06:00
..
__init__.py	[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951 )	2024-04-23 08:02:36 +00:00
conftest.py	[Bugfix] Make spec. decode respect per-request seed. (#6034 )	2024-07-18 19:22:08 -07:00
test_compatibility.py	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 )	2024-05-16 00:53:51 -07:00
test_integration_dist_tp2.py	[Model] RowParallelLinear: pass bias to quant_method.apply (#6327 )	2024-07-19 07:15:22 -06:00
test_integration_dist_tp4.py	[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369 )	2024-07-19 06:01:09 -07:00
test_integration.py	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 )	2024-05-16 00:53:51 -07:00
test_logprobs.py	[Speculative decoding] Support target-model logprobs (#4378 )	2024-05-03 15:52:01 -07:00
test_medusa_correctness.py	[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978 )	2024-07-09 18:34:02 -07:00
test_mlp_correctness.py	[CORE] Quantized lm-head Framework (#4442 )	2024-07-02 22:25:17 +00:00
test_multistep_correctness.py	[Misc] Log spec decode metrics (#6454 )	2024-07-16 20:37:10 +00:00
test_ngram_correctness.py	[Dynamic Spec Decoding] Minor fix for disabling speculative decoding (#5000 )	2024-05-25 10:00:14 -07:00
test_seed.py	[Bugfix] Make spec. decode respect per-request seed. (#6034 )	2024-07-18 19:22:08 -07:00