Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Cody Yu
|
a62aaf1df5
|
[Misc][Refactor] Generalize linear_method to be quant_method (#4373)
|
2024-04-26 16:41:14 -04:00 |
|
James Fleming
|
2b7949c1c2
|
AQLM CUDA support (#3287)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-04-23 13:59:33 -04:00 |
|
Antoni Baum
|
a10d3056da
|
[Core] Set linear_weights directly on the layer (#3977)
|
2024-04-11 16:35:51 -04:00 |
|
Kunshang Ji
|
e9da5a40c6
|
[Misc] Add indirection layer for custom ops (#3913)
|
2024-04-10 20:26:07 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Zhuohan Li
|
2f8844ba08
|
Re-enable the 80 char line width limit (#3305)
|
2024-03-10 19:49:14 -07:00 |
|
CHU Tianxiang
|
01a5d18a53
|
Add Support for 2/3/8-bit GPTQ Quantization Models (#2330)
|
2024-02-28 21:52:23 -08:00 |
|
Kunshang Ji
|
96b6f475dd
|
Remove hardcoded device="cuda" to support more devices (#2503)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2024-02-01 15:46:39 -08:00 |
|
CHU Tianxiang
|
0fbfc4b81b
|
Add GPTQ support (#916)
|
2023-12-15 03:04:22 -08:00 |
|