youkaichao
|
d18f4e73f3
|
[Bugfix] [Hotfix] fix nccl library name (#3661)
|
2024-03-27 17:23:54 +00:00 |
|
Woosuk Kwon
|
82c540bebf
|
[Bugfix] More faithful implementation of Gemma (#3653)
|
2024-03-27 09:37:18 -07:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
Woosuk Kwon
|
e66b629c04
|
[Misc] Minor fix in KVCache type (#3652)
|
2024-03-26 23:14:06 -07:00 |
|
Jee Li
|
76879342a3
|
[Doc]add lora support (#3649)
|
2024-03-27 02:06:46 +00:00 |
|
Jee Li
|
566b57c5c4
|
[Kernel] support non-zero cuda devices in punica kernels (#3636)
|
2024-03-27 00:37:42 +00:00 |
|
Nick Hill
|
0dc72273b8
|
[BugFix] Fix ipv4 address parsing regression (#3645)
|
2024-03-26 14:39:44 -07:00 |
|
liiliiliil
|
a979d9771e
|
[Bugfix] Fix ipv6 address parsing bug (#3641)
|
2024-03-26 11:58:20 -07:00 |
|
Jee Li
|
8af890a865
|
Enable more models to inference based on LoRA (#3382)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-03-25 18:09:31 -07:00 |
|
Nick Hill
|
dfeb2ecc3a
|
[Misc] Include matched stop string/token in responses (#2976)
Co-authored-by: Sahil Suneja <sahilsuneja@gmail.com>
|
2024-03-25 17:31:32 -07:00 |
|
Antoni Baum
|
3a243095e5
|
Optimize _get_ranks in Sampler (#3623)
|
2024-03-25 16:03:02 -07:00 |
|
xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
Simon Mo
|
f408d05c52
|
hotfix isort on logprobs ranks pr (#3622)
|
2024-03-25 11:55:46 -07:00 |
|
Dylan Hawk
|
0b4997e05c
|
[Bugfix] API stream returning two stops (#3450)
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
|
2024-03-25 10:14:34 -07:00 |
|
Travis Johnson
|
c13ad1b7bd
|
feat: implement the min_tokens sampling parameter (#3124)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-03-25 10:14:26 -07:00 |
|
Swapnil Parekh
|
819924e749
|
[Core] Adding token ranks along with logprobs (#3516)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
|
2024-03-25 10:13:10 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
TianYu GUO
|
e67c295b0c
|
[Bugfix] fix automatic prefix args and add log info (#3608)
|
2024-03-25 05:35:22 -07:00 |
|
Woosuk Kwon
|
925f3332ca
|
[Core] Refactor Attention Take 2 (#3462)
|
2024-03-25 04:39:33 +00:00 |
|
少年
|
b0dfa91dd7
|
[Model] Add starcoder2 awq support (#3569)
|
2024-03-24 21:07:36 -07:00 |
|
Woosuk Kwon
|
56a8652f33
|
[Bugfix] store lock file in tmp directory (#3578)" (#3599)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-03-24 20:06:50 -07:00 |
|
Kunshang Ji
|
6d93d35308
|
[BugFix] tensor.get_device() -> tensor.device (#3604)
|
2024-03-24 19:01:13 -07:00 |
|
youkaichao
|
837e185142
|
[CI/Build] fix flaky test (#3602)
|
2024-03-24 17:43:05 -07:00 |
|
youkaichao
|
42bc386129
|
[CI/Build] respect the common environment variable MAX_JOBS (#3600)
|
2024-03-24 17:04:00 -07:00 |
|
youkaichao
|
8b268a46a7
|
[CI] typo fix: is_hip --> is_hip() (#3595)
|
2024-03-24 16:03:06 -07:00 |
|
Nick Hill
|
41deac4a3d
|
[BugFix] 1D query fix for MoE models (#3597)
|
2024-03-24 16:00:16 -07:00 |
|
Woosuk Kwon
|
af9e53496f
|
[BugFix] Fix Falcon tied embeddings (#3590)
Co-authored-by: 44670 <44670@users.noreply.github.com>
|
2024-03-24 06:34:01 -07:00 |
|
Roger Wang
|
f8a12ecc7f
|
[Misc] Bump transformers version (#3592)
|
2024-03-24 06:32:45 -07:00 |
|
Woosuk Kwon
|
3c5ab9b811
|
[Misc] Fix BLOOM copyright notice (#3591)
|
2024-03-23 23:30:56 -07:00 |
|
kota-iizuka
|
743a0b7402
|
[Bugfix] use SoftLockFile instead of LockFile (#3578)
|
2024-03-23 11:43:11 -07:00 |
|
Antoni Baum
|
bfdb1ba5c3
|
[Core] Improve detokenization performance for prefill (#3469)
Co-authored-by: MeloYang <meloyang05@gmail.com>
|
2024-03-22 13:44:12 -07:00 |
|
Thomas Parnell
|
cf2f084d56
|
Dynamic scheduler delay to improve ITL performance (#3279)
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2024-03-22 12:28:14 -07:00 |
|
Hanzhi Zhou
|
f721096d48
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
Zhuohan Li
|
e90fc21f2e
|
[Hardware][Neuron] Refactor neuron support (#3471)
|
2024-03-22 01:22:17 +00:00 |
|
Roy
|
ea5f14e6ff
|
[Bugfix][Model] Fix Qwen2 (#3554)
|
2024-03-22 00:18:58 +00:00 |
|
Taemin Lee
|
b7050ca7df
|
[BugFix] gemma loading after quantization or LoRA. (#3553)
|
2024-03-21 13:16:57 -07:00 |
|
Woosuk Kwon
|
c188ecb080
|
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551)
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Roger Meier <r.meier@siemens.com>
|
2024-03-21 07:58:12 -07:00 |
|
Roy
|
865732342b
|
[Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500)
|
2024-03-21 18:07:48 +08:00 |
|
Lalit Pradhan
|
4c07dd28c0
|
[🚀 Ready to be merged] Added support for Jais models (#3183)
|
2024-03-21 09:45:24 +00:00 |
|
SangBin Cho
|
3bbff9e5ab
|
Fix 1D query issue from _prune_hidden_states (#3539)
|
2024-03-21 08:49:06 +00:00 |
|
ElizaWszola
|
6ebd02bdef
|
[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Luka <luka@paperspace>
|
2024-03-20 23:20:04 -07:00 |
|
Zhuohan Li
|
523e30ea0c
|
[BugFix] Hot fix in setup.py for neuron build (#3537)
|
2024-03-20 17:59:52 -07:00 |
|
Roy
|
f1c0fc3919
|
Migrate logits computation and gather to model_runner (#3233)
|
2024-03-20 23:25:01 +00:00 |
|
SangBin Cho
|
6e435de766
|
[1/n][Chunked Prefill] Refactor input query shapes (#3236)
|
2024-03-20 14:46:05 -07:00 |
|
Antoni Baum
|
426ec4ec67
|
[1/n] Triton sampling kernel (#3186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-03-20 14:45:08 -07:00 |
|
James Whedbee
|
80e254834d
|
[Bugfix] Fix ROCm support in CMakeLists.txt (#3534)
|
2024-03-20 21:05:03 +00:00 |
|
bnellnm
|
ba8ae1d84f
|
Check for _is_cuda() in compute_num_jobs (#3481)
|
2024-03-20 10:06:56 -07:00 |
|
Allen.Dou
|
84eaa68425
|
Abort when nvcc command is not found in the PATH (#3527)
|
2024-03-20 09:28:29 -07:00 |
|
Woosuk Kwon
|
5ee14494e4
|
[Misc] Remove cache stream and cache events (#3461)
|
2024-03-20 00:38:53 -07:00 |
|
Nick Hill
|
4ad521d8b5
|
[Core] Add generic typing to LRUCache (#3511)
|
2024-03-20 00:36:09 -07:00 |
|