Commit Graph

932 Commits

Author SHA1 Message Date
Hanzhi Zhou
f721096d48
[BugFix] Some fixes for custom allreduce kernels (#2760) 2024-03-21 23:02:58 -07:00
Zhuohan Li
e90fc21f2e
[Hardware][Neuron] Refactor neuron support (#3471) 2024-03-22 01:22:17 +00:00
Roy
ea5f14e6ff
[Bugfix][Model] Fix Qwen2 (#3554) 2024-03-22 00:18:58 +00:00
Taemin Lee
b7050ca7df
[BugFix] gemma loading after quantization or LoRA. (#3553) 2024-03-21 13:16:57 -07:00
Woosuk Kwon
c188ecb080
[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551)
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Roger Meier <r.meier@siemens.com>
2024-03-21 07:58:12 -07:00
Roy
865732342b
[Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500) 2024-03-21 18:07:48 +08:00
Lalit Pradhan
4c07dd28c0
[🚀 Ready to be merged] Added support for Jais models (#3183) 2024-03-21 09:45:24 +00:00
SangBin Cho
3bbff9e5ab
Fix 1D query issue from _prune_hidden_states (#3539) 2024-03-21 08:49:06 +00:00
ElizaWszola
6ebd02bdef
[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431)
Co-authored-by: rsnm2 <rshaw@neuralmagic.com>
Co-authored-by: Luka <luka@paperspace>
2024-03-20 23:20:04 -07:00
Zhuohan Li
523e30ea0c
[BugFix] Hot fix in setup.py for neuron build (#3537) 2024-03-20 17:59:52 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner (#3233) 2024-03-20 23:25:01 +00:00
SangBin Cho
6e435de766
[1/n][Chunked Prefill] Refactor input query shapes (#3236) 2024-03-20 14:46:05 -07:00
Antoni Baum
426ec4ec67
[1/n] Triton sampling kernel (#3186)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-03-20 14:45:08 -07:00
James Whedbee
80e254834d
[Bugfix] Fix ROCm support in CMakeLists.txt (#3534) 2024-03-20 21:05:03 +00:00
bnellnm
ba8ae1d84f
Check for _is_cuda() in compute_num_jobs (#3481) 2024-03-20 10:06:56 -07:00
Allen.Dou
84eaa68425
Abort when nvcc command is not found in the PATH (#3527) 2024-03-20 09:28:29 -07:00
Woosuk Kwon
5ee14494e4
[Misc] Remove cache stream and cache events (#3461) 2024-03-20 00:38:53 -07:00
Nick Hill
4ad521d8b5
[Core] Add generic typing to LRUCache (#3511) 2024-03-20 00:36:09 -07:00
ElizaWszola
9474e89ba4
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-20 00:11:11 -07:00
Simon Mo
20478c4d3a
Use lru_cache for some environment detection utils (#3508) 2024-03-19 21:34:15 +00:00
Jim Burtoft
63e8b28a99
[Doc] minor fix of spelling in amd-installation.rst (#3506) 2024-03-19 20:32:30 +00:00
Simon Mo
cc63d03fbb
Revert "[Core] Cache some utils" (#3507) 2024-03-19 13:22:58 -07:00
Jim Burtoft
2a60c9bd17
[Doc] minor fix to neuron-installation.rst (#3505) 2024-03-19 13:21:35 -07:00
ifsheldon
c614cfee58
Update dockerfile with ModelScope support (#3429) 2024-03-19 10:54:59 -07:00
Nick Hill
7341c77d69
[BugFix] Avoid initializing CUDA too early (#3487) 2024-03-18 23:05:20 -07:00
Simon Mo
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
youkaichao
6a9c583e73
[Core] print error before deadlock (#3459) 2024-03-19 04:06:23 +00:00
Antoni Baum
b37cdce2b1
[Core] Cache some utils (#3474) 2024-03-18 17:14:26 -07:00
Zhuohan Li
b30880a762
[Misc] Update README for the Third vLLM Meetup (#3479) 2024-03-18 15:58:38 -07:00
Antoni Baum
49eedea373
[Core] Zero-copy asdict for InputMetadata (#3475) 2024-03-18 22:56:40 +00:00
bnellnm
9fdf3de346
Cmake based build system (#2830) 2024-03-18 15:38:33 -07:00
Zhuohan Li
c0c17d4896
[Misc] Fix PR Template (#3478) 2024-03-18 15:00:31 -07:00
Robert Shaw
097aa0ea22
[CI/Build] Fix Bad Import In Test (#3473) 2024-03-18 20:28:00 +00:00
Cade Daniel
482b0adf1b
[Testing] Add test_config.py to CI (#3437) 2024-03-18 12:48:45 -07:00
Simon Mo
8c654c045f
CI: Add ROCm Docker Build (#2886) 2024-03-18 19:33:47 +00:00
Woosuk Kwon
9101d832e6
[Bugfix] Make moe_align_block_size AMD-compatible (#3470) 2024-03-18 11:26:24 -07:00
Simon Mo
93348d9458
[CI] Shard tests for LoRA and Kernels to speed up (#3445) 2024-03-17 14:56:30 -07:00
Woosuk Kwon
abfc4f3387
[Misc] Use dataclass for InputMetadata (#3452)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-03-17 10:02:46 +00:00
Simon Mo
6b78837b29
Fix setup.py neuron-ls issue (#2671) 2024-03-16 16:00:25 -07:00
Simon Mo
120157fd2a
Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) 2024-03-16 13:35:27 -07:00
Simon Mo
8e67598aa6
[Misc] fix line length for entire codebase (#3444) 2024-03-16 00:36:29 -07:00
simon-mo
ad50bf4b25 fix lint 2024-03-15 22:23:38 -07:00
Dinghow Yang
cf6ff18246
Fix Baichuan chat template (#3340) 2024-03-15 21:02:12 -07:00
Ronen Schaffer
14e3f9a1b2
Replace lstrip() with removeprefix() to fix Ruff linter warning (#2958) 2024-03-15 21:01:30 -07:00
Tao He
3123f15138
Fixes the incorrect argument in the prefix-prefill test cases (#3246) 2024-03-15 20:58:10 -07:00
youkaichao
413366e9a2
[Misc] PR templates (#3413)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-15 18:25:51 -07:00
Robert Shaw
10585e035e
Removed Extraneous Print Message From OAI Server (#3440) 2024-03-16 00:35:36 +00:00
Antoni Baum
fb96c1e98c
Asynchronous tokenization (#2879) 2024-03-15 23:37:01 +00:00
laneeee
8fa7357f2d
fix document error for value and v_vec illustration (#3421) 2024-03-15 16:06:09 -07:00
Harry Mellor
a7af4538ca
Fix issue templates (#3436) 2024-03-15 21:26:00 +00:00