squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Hanzhi Zhou	f721096d48	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
Zhuohan Li	e90fc21f2e	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
Roy	ea5f14e6ff	[Bugfix][Model] Fix Qwen2 (#3554 )	2024-03-22 00:18:58 +00:00
Taemin Lee	b7050ca7df	[BugFix] gemma loading after quantization or LoRA. (#3553 )	2024-03-21 13:16:57 -07:00
Woosuk Kwon	c188ecb080	[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551 ) Co-authored-by: Roy <jasonailu87@gmail.com> Co-authored-by: Roger Meier <r.meier@siemens.com>	2024-03-21 07:58:12 -07:00
Roy	865732342b	[Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500 )	2024-03-21 18:07:48 +08:00
Lalit Pradhan	4c07dd28c0	[🚀 Ready to be merged] Added support for Jais models (#3183 )	2024-03-21 09:45:24 +00:00
SangBin Cho	3bbff9e5ab	Fix 1D query issue from `_prune_hidden_states` (#3539 )	2024-03-21 08:49:06 +00:00
ElizaWszola	6ebd02bdef	[PREFIX CACHING FOLLOW UP] OrderedDict-based evictor (#3431 ) Co-authored-by: rsnm2 <rshaw@neuralmagic.com> Co-authored-by: Luka <luka@paperspace>	2024-03-20 23:20:04 -07:00
Zhuohan Li	523e30ea0c	[BugFix] Hot fix in setup.py for neuron build (#3537 )	2024-03-20 17:59:52 -07:00
Roy	f1c0fc3919	Migrate `logits` computation and gather to `model_runner` (#3233 )	2024-03-20 23:25:01 +00:00
SangBin Cho	6e435de766	[1/n][Chunked Prefill] Refactor input query shapes (#3236 )	2024-03-20 14:46:05 -07:00
Antoni Baum	426ec4ec67	[1/n] Triton sampling kernel (#3186 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-03-20 14:45:08 -07:00
James Whedbee	80e254834d	[Bugfix] Fix ROCm support in CMakeLists.txt (#3534 )	2024-03-20 21:05:03 +00:00
bnellnm	ba8ae1d84f	Check for _is_cuda() in compute_num_jobs (#3481 )	2024-03-20 10:06:56 -07:00
Allen.Dou	84eaa68425	Abort when nvcc command is not found in the PATH (#3527 )	2024-03-20 09:28:29 -07:00
Woosuk Kwon	5ee14494e4	[Misc] Remove cache stream and cache events (#3461 )	2024-03-20 00:38:53 -07:00
Nick Hill	4ad521d8b5	[Core] Add generic typing to `LRUCache` (#3511 )	2024-03-20 00:36:09 -07:00
ElizaWszola	9474e89ba4	[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-20 00:11:11 -07:00
Simon Mo	20478c4d3a	Use lru_cache for some environment detection utils (#3508 )	2024-03-19 21:34:15 +00:00
Jim Burtoft	63e8b28a99	[Doc] minor fix of spelling in amd-installation.rst (#3506 )	2024-03-19 20:32:30 +00:00
Simon Mo	cc63d03fbb	Revert "[Core] Cache some utils" (#3507 )	2024-03-19 13:22:58 -07:00
Jim Burtoft	2a60c9bd17	[Doc] minor fix to neuron-installation.rst (#3505 )	2024-03-19 13:21:35 -07:00
ifsheldon	c614cfee58	Update dockerfile with ModelScope support (#3429 )	2024-03-19 10:54:59 -07:00
Nick Hill	7341c77d69	[BugFix] Avoid initializing CUDA too early (#3487 )	2024-03-18 23:05:20 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
youkaichao	6a9c583e73	[Core] print error before deadlock (#3459 )	2024-03-19 04:06:23 +00:00
Antoni Baum	b37cdce2b1	[Core] Cache some utils (#3474 )	2024-03-18 17:14:26 -07:00
Zhuohan Li	b30880a762	[Misc] Update README for the Third vLLM Meetup (#3479 )	2024-03-18 15:58:38 -07:00
Antoni Baum	49eedea373	[Core] Zero-copy asdict for InputMetadata (#3475 )	2024-03-18 22:56:40 +00:00
bnellnm	9fdf3de346	Cmake based build system (#2830 )	2024-03-18 15:38:33 -07:00
Zhuohan Li	c0c17d4896	[Misc] Fix PR Template (#3478 )	2024-03-18 15:00:31 -07:00
Robert Shaw	097aa0ea22	[CI/Build] Fix Bad Import In Test (#3473 )	2024-03-18 20:28:00 +00:00
Cade Daniel	482b0adf1b	[Testing] Add test_config.py to CI (#3437 )	2024-03-18 12:48:45 -07:00
Simon Mo	8c654c045f	CI: Add ROCm Docker Build (#2886 )	2024-03-18 19:33:47 +00:00
Woosuk Kwon	9101d832e6	[Bugfix] Make moe_align_block_size AMD-compatible (#3470 )	2024-03-18 11:26:24 -07:00
Simon Mo	93348d9458	[CI] Shard tests for LoRA and Kernels to speed up (#3445 )	2024-03-17 14:56:30 -07:00
Woosuk Kwon	abfc4f3387	[Misc] Use dataclass for InputMetadata (#3452 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-03-17 10:02:46 +00:00
Simon Mo	6b78837b29	Fix setup.py neuron-ls issue (#2671 )	2024-03-16 16:00:25 -07:00
Simon Mo	120157fd2a	Support arbitrary json_object in OpenAI and Context Free Grammar (#3211 )	2024-03-16 13:35:27 -07:00
Simon Mo	8e67598aa6	[Misc] fix line length for entire codebase (#3444 )	2024-03-16 00:36:29 -07:00
simon-mo	ad50bf4b25	fix lint	2024-03-15 22:23:38 -07:00
Dinghow Yang	cf6ff18246	Fix Baichuan chat template (#3340 )	2024-03-15 21:02:12 -07:00
Ronen Schaffer	14e3f9a1b2	Replace `lstrip()` with `removeprefix()` to fix Ruff linter warning (#2958 )	2024-03-15 21:01:30 -07:00
Tao He	3123f15138	Fixes the incorrect argument in the prefix-prefill test cases (#3246 )	2024-03-15 20:58:10 -07:00
youkaichao	413366e9a2	[Misc] PR templates (#3413 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-15 18:25:51 -07:00
Robert Shaw	10585e035e	Removed Extraneous Print Message From OAI Server (#3440 )	2024-03-16 00:35:36 +00:00
Antoni Baum	fb96c1e98c	Asynchronous tokenization (#2879 )	2024-03-15 23:37:01 +00:00
laneeee	8fa7357f2d	fix document error for value and v_vec illustration (#3421 )	2024-03-15 16:06:09 -07:00
Harry Mellor	a7af4538ca	Fix issue templates (#3436 )	2024-03-15 21:26:00 +00:00

1 2 3 4 5 ...

932 Commits