The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter). |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_aqlm.py | ||
| test_big_models.py | ||
| test_embedding.py | ||
| test_fp8.py | ||
| test_gptq_marlin_24.py | ||
| test_gptq_marlin.py | ||
| test_llava.py | ||
| test_marlin.py | ||
| test_mistral.py | ||
| test_models.py | ||
| test_oot_registration.py | ||
| test_registry.py | ||
| utils.py | ||