Commit Graph

23 Commits

Author SHA1 Message Date
Joe Runde
ef7faad1b8
🐛 Fixup more test failures from memory profiling (#9563)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-21 17:10:56 -07:00
Michael Goin
9ba0bd6aa6
Add lm-eval directly to requirements-test.txt (#9161) 2024-10-08 18:22:31 -07:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271) 2024-09-27 14:25:10 -04:00
Michael Goin
af59df0a10
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961) 2024-08-28 19:19:17 -04:00
Luka Govedič
7937009a7e
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-21 20:18:00 -04:00
Michael Goin
44f26a9466
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611) 2024-08-16 15:56:34 -07:00
Dipika Sikka
181abbc27d
[Misc] Update LM Eval Tolerance (#7473) 2024-08-13 14:28:14 -04:00
Dipika Sikka
a3bbbfa1d8
[BugFix] Fix DeepSeek remote code (#7178) 2024-08-06 08:16:53 -07:00
HandH1998
6512937de1
Support W4A8 quantization for vllm (#5218) 2024-07-31 07:55:21 -06:00
Michael Goin
07278c37dd
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) 2024-07-26 14:33:42 -04:00
Robert Shaw
889da130e7
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-07-25 09:46:04 -07:00
Robert Shaw
9364f74eee
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606) 2024-07-20 18:50:10 +00:00
Robert Shaw
683e3cb9c4
[ Misc ] fbgemm checkpoints (#6559) 2024-07-20 09:36:57 -07:00
Robert Shaw
4cc24f01b1
[ Kernel ] Enable Dynamic Per Token fp8 (#6547) 2024-07-19 23:08:15 +00:00
Robert Shaw
dbe5588554
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515) 2024-07-18 22:39:18 -04:00
Robert Shaw
73030b7dae
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423) 2024-07-14 21:38:42 +00:00
Tyler Michael Smith
9dad5cc859
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384) 2024-07-14 13:37:19 +00:00
Robert Shaw
fb6af8bc08
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417) 2024-07-13 20:03:58 -07:00
Robert Shaw
babf52dade
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
2024-07-13 10:21:37 +00:00
Robert Shaw
aea19f0989
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356) 2024-07-12 11:11:29 -04:00
Robert Shaw
abfe705a02
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Robert Shaw
7c008c51a9
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-02 21:54:35 +00:00
Robert Shaw
75aa1442db
[ CI/Build ] LM Eval Harness Based CI Testing (#5838)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-06-29 13:04:30 -04:00