Mengqing Cao
|
9d827170a3
|
[Platforms] Add device_type in Platform (#10508)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-21 04:44:20 +00:00 |
|
Pavani Majety
|
6c1208d083
|
[Core] Add Sliding Window Support with Flashinfer (#10462)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2024-11-20 19:56:47 -08:00 |
|
youkaichao
|
388ee3de66
|
[torch.compile] limit inductor threads and lazy import quant (#10482)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 18:36:33 -08:00 |
|
Woosuk Kwon
|
2f77b6cfec
|
[TPU] Implement prefix caching for TPUs (#10307)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-20 13:54:15 -08:00 |
|
Guillaume Calmettes
|
c68f7ede6a
|
[Bugfix]: allow extra fields in requests to openai compatible server (#10463)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-20 16:42:21 -05:00 |
|
youkaichao
|
0cd3d9717e
|
[7/N] torch.compile, reduce compilation time (#10460)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 11:20:38 -08:00 |
|
Simon Mo
|
5f1d6af2b6
|
[perf bench] H200 development (#9768)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-20 11:06:56 -08:00 |
|
youkaichao
|
772a66732d
|
[platforms] restore xpu check for parallel config (#10479)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 17:13:28 +00:00 |
|
Li, Jiang
|
63f1fde277
|
[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-20 10:57:39 +00:00 |
|
Mengqing Cao
|
d5b28447e0
|
[Platforms] Refactor xpu code (#10468)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-19 22:52:13 -08:00 |
|
Cyrus Leung
|
09dbf9ff16
|
[Bugfix] Handle conflicts between modern and legacy fields (#10471)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-20 14:45:08 +08:00 |
|
Sky Lee
|
343041c4c4
|
[model] Reduce medusa weight (#10454)
Signed-off-by: skylee-01 <497627264@qq.com>
|
2024-11-20 06:05:55 +00:00 |
|
Kevin H. Luu
|
ed701ca963
|
[ci/build] Combine nightly and optional (#10465)
|
2024-11-19 21:36:03 -08:00 |
|
wchen61
|
7629a9c6e5
|
[CI/Build] Support compilation with local cutlass path (#10423) (#10424)
|
2024-11-19 21:35:50 -08:00 |
|
Rafael Vasquez
|
709c9f1f25
|
[CI/Build] Add sphinx/rst linter for docs (#10366)
|
2024-11-19 21:35:31 -08:00 |
|
Cyrus Leung
|
b4be5a8adb
|
[Bugfix] Enforce no chunked prefill for embedding models (#10470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-20 05:12:51 +00:00 |
|
Isotr0py
|
ad44437ba3
|
[Bugfix] Fix Mamba model initialization and MLP Speculator weights loading (#10456)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-20 05:04:05 +00:00 |
|
Yanyi Liu
|
9e05252b46
|
[Misc] Add __setitem__ for LazyDict (#10469)
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
|
2024-11-20 04:44:57 +00:00 |
|
Lucas Wilkinson
|
d200972e7f
|
[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-19 19:40:33 -08:00 |
|
Alexei-V-Ivanov-AMD
|
d5b68aba2f
|
[CI/Build] Update Dockerfile.rocm (#10434)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2024-11-19 17:19:59 -08:00 |
|
Maximilien de Bayser
|
a324d3a1a7
|
Change granite chat template to keep json list formatting for tool calls (#10452)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
|
2024-11-19 18:16:54 -07:00 |
|
ElizaWszola
|
b00b33d77e
|
[Model][Quantization] HQQ support through Marlin kernel expansion (#9766)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-11-19 13:31:12 -08:00 |
|
Russell Bryant
|
efa9084628
|
[Core] Avoid metrics log noise when idle (#8868)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-19 21:05:25 +00:00 |
|
youkaichao
|
803f37eaaa
|
[6/N] torch.compile rollout to users (#10437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-19 10:09:03 -08:00 |
|
Russell Bryant
|
fd9f124971
|
[Doc] fix link for page that was renamed (#10455)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-19 09:48:30 -08:00 |
|
Manjul Mohan
|
1ea291a417
|
Fix: Build error seen on Power Architecture (#10421)
Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: yan ma <yan.ma@intel.com>
Signed-off-by: Angus Wang <wangjadehao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: ismael-dm <ismaeldm99@gmail.com>
Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Angus Wang <wangjadehao@gmail.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Ricky Xu <rickyx@anyscale.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-19 09:34:57 -08:00 |
|
Patrick von Platen
|
11fd7ea639
|
[Pixtral-Large] Pixtral actually has no bias in vision-lang adapter (#10449)
|
2024-11-19 17:33:06 +00:00 |
|
COSMOPlat
|
f028dff33d
|
[BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395) (#10398)
Signed-off-by: xiyuan lee <lixiyuan@haier.com>
|
2024-11-19 13:42:50 +00:00 |
|
Yuan
|
b4614656b8
|
[CI][CPU] adding numa node number as container name suffix (#10441)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-11-19 13:16:43 +00:00 |
|
youkaichao
|
25f9c78961
|
[misc][plugin] improve plugin loading (#10443)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-19 10:43:21 +00:00 |
|
Russell Bryant
|
5390d6664f
|
[Doc] Add the start of an arch overview page (#10368)
|
2024-11-19 09:52:11 +00:00 |
|
Jee Jee Li
|
382b6a4852
|
[Misc] Avoid misleading warning messages (#10438)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-19 08:54:58 +00:00 |
|
Travis Johnson
|
272e31c0bd
|
[Bugfix] Guard for negative counter metrics to prevent crash (#10430)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-11-19 04:57:10 +00:00 |
|
Michael Goin
|
74f8c2cf5f
|
Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433)
|
2024-11-19 04:37:46 +00:00 |
|
Mengqing Cao
|
8c1fb50705
|
[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2024-11-19 11:22:26 +08:00 |
|
Jee Jee Li
|
7eb719df13
|
[Bugfix]Fix Phi-3 BNB online quantization (#10417)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-19 03:21:42 +00:00 |
|
Kevin H. Luu
|
284203f171
|
[ci/build] Have dependabot ignore all patch update (#10436)
We have too many dependencies and all patch updates can be a little noisy. This is to have dependabot ignore all patch version updates.
|
2024-11-19 01:04:25 +00:00 |
|
Ricky Xu
|
90a6c759ca
|
[misc] partial prefix & random input generation benchmark (#9929)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-18 15:39:14 -08:00 |
|
youkaichao
|
2298e69b5f
|
[ci][bugfix] fix kernel tests (#10431)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-18 15:29:37 -08:00 |
|
youkaichao
|
a03ea40792
|
[3/N][torch.compile] consolidate custom op logging (#10399)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-18 15:14:59 -08:00 |
|
Lucas Wilkinson
|
96d999fbe8
|
[Kernel] Initial Machete W4A8 support + Refactors (#9855)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-11-18 12:59:29 -07:00 |
|
Angus Wang
|
c2170a5b39
|
[Kernel] Explicitly specify other value in tl.load calls (#9014)
Signed-off-by: Angus Wang <wangjadehao@gmail.com>
|
2024-11-18 11:39:40 -08:00 |
|
Yan Ma
|
6b2d25efc7
|
[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-11-18 11:18:05 -07:00 |
|
Michael Goin
|
281cc4b3cd
|
[Model][Bugfix] Support TP for PixtralHF ViT (#10405)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-18 10:04:14 -08:00 |
|
Andrew Nesbitt
|
4f686d139f
|
Fix open_collective value in FUNDING.yml (#10426)
Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com>
|
2024-11-18 09:52:42 -08:00 |
|
ismael-dm
|
31894a2155
|
[Doc] Add documentation for Structured Outputs (#9943)
Signed-off-by: ismael-dm <ismaeldm99@gmail.com>
|
2024-11-18 09:52:12 -08:00 |
|
youkaichao
|
7851b45196
|
[5/N][torch.compile] torch.jit.script --> torch.compile (#10406)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-18 23:20:06 +08:00 |
|
B-201
|
4186be8111
|
[Doc] Update doc for LoRA support in GLM-4V (#10425)
Signed-off-by: B-201 <Joy25810@foxmail.com>
|
2024-11-18 15:08:30 +00:00 |
|
Isotr0py
|
e7ebb662d7
|
[Model] Remove transformers attention porting in VITs (#10414)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-18 21:45:21 +08:00 |
|
B-201
|
5be4e52b65
|
[Model][LoRA]LoRA support added for glm-4v (#10418)
Signed-off-by: B-201 <Joy25810@foxmail.com>
|
2024-11-18 12:57:10 +00:00 |
|