squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Antoni Baum	49eedea373	[Core] Zero-copy asdict for InputMetadata (#3475 )	2024-03-18 22:56:40 +00:00
bnellnm	9fdf3de346	Cmake based build system (#2830 )	2024-03-18 15:38:33 -07:00
Zhuohan Li	c0c17d4896	[Misc] Fix PR Template (#3478 )	2024-03-18 15:00:31 -07:00
Robert Shaw	097aa0ea22	[CI/Build] Fix Bad Import In Test (#3473 )	2024-03-18 20:28:00 +00:00
Cade Daniel	482b0adf1b	[Testing] Add test_config.py to CI (#3437 )	2024-03-18 12:48:45 -07:00
Simon Mo	8c654c045f	CI: Add ROCm Docker Build (#2886 )	2024-03-18 19:33:47 +00:00
Woosuk Kwon	9101d832e6	[Bugfix] Make moe_align_block_size AMD-compatible (#3470 )	2024-03-18 11:26:24 -07:00
Simon Mo	93348d9458	[CI] Shard tests for LoRA and Kernels to speed up (#3445 )	2024-03-17 14:56:30 -07:00
Woosuk Kwon	abfc4f3387	[Misc] Use dataclass for InputMetadata (#3452 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-03-17 10:02:46 +00:00
Simon Mo	6b78837b29	Fix setup.py neuron-ls issue (#2671 )	2024-03-16 16:00:25 -07:00
Simon Mo	120157fd2a	Support arbitrary json_object in OpenAI and Context Free Grammar (#3211 )	2024-03-16 13:35:27 -07:00
Simon Mo	8e67598aa6	[Misc] fix line length for entire codebase (#3444 )	2024-03-16 00:36:29 -07:00
simon-mo	ad50bf4b25	fix lint	2024-03-15 22:23:38 -07:00
Dinghow Yang	cf6ff18246	Fix Baichuan chat template (#3340 )	2024-03-15 21:02:12 -07:00
Ronen Schaffer	14e3f9a1b2	Replace `lstrip()` with `removeprefix()` to fix Ruff linter warning (#2958 )	2024-03-15 21:01:30 -07:00
Tao He	3123f15138	Fixes the incorrect argument in the prefix-prefill test cases (#3246 )	2024-03-15 20:58:10 -07:00
youkaichao	413366e9a2	[Misc] PR templates (#3413 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-15 18:25:51 -07:00
Robert Shaw	10585e035e	Removed Extraneous Print Message From OAI Server (#3440 )	2024-03-16 00:35:36 +00:00
Antoni Baum	fb96c1e98c	Asynchronous tokenization (#2879 )	2024-03-15 23:37:01 +00:00
laneeee	8fa7357f2d	fix document error for value and v_vec illustration (#3421 )	2024-03-15 16:06:09 -07:00
Harry Mellor	a7af4538ca	Fix issue templates (#3436 )	2024-03-15 21:26:00 +00:00
youkaichao	604f235937	[Misc] add error message in non linux platform (#3438 )	2024-03-15 21:21:37 +00:00
Tao He	14b8ae02e7	Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220 ) Signed-off-by: Tao He <sighingnow@gmail.com> Co-authored-by: simon-mo <simon.mo@hey.com>	2024-03-15 18:25:43 +00:00
Dan Clark	03d37f2441	[Fix] Add args for mTLS support (#3430 ) Co-authored-by: declark1 <daniel.clark@ibm.com>	2024-03-15 09:56:13 -07:00
Yang Fan	a7c871680e	Fix tie_word_embeddings for Qwen2. (#3344 )	2024-03-15 09:36:53 -07:00
Junda Chen	429284dc37	Fix `dist.broadcast` stall without group argument (#3408 )	2024-03-14 23:25:05 -07:00
Dinghow Yang	253a98078a	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
Dinghow Yang	21539e6856	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
youkaichao	b522c4476f	[Misc] add HOST_IP env var (#3419 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-14 21:32:52 -07:00
akhoroshev	78b6c4845a	Dynamically configure shared memory size for moe_align_block_size_kernel (#3376 )	2024-03-14 18:18:07 -07:00
Enrique Shockwave	b983ba35bd	fix marlin config repr (#3414 )	2024-03-14 16:26:19 -07:00
陈序	54be8a0be2	Fix assertion failure in Qwen 1.5 with prefix caching enabled (#3373 ) Co-authored-by: Cade Daniel <edacih@gmail.com>	2024-03-14 13:56:57 -07:00
youkaichao	dfc77408bd	[issue templates] add some issue templates (#3412 )	2024-03-14 13:16:00 -07:00
Dan Clark	c17ca8ef18	Add args for mTLS support (#3410 ) Co-authored-by: Daniel Clark <daniel.clark@ibm.com>	2024-03-14 13:11:45 -07:00
Thomas Parnell	06ec486794	Install `flash_attn` in Docker image (#3396 )	2024-03-14 10:55:54 -07:00
youkaichao	8fe8386591	[Kernel] change benchmark script so that result can be directly used; tune moe kernel in A100/H100 with tp=2,4,8 (#3389 )	2024-03-14 08:11:48 +00:00
Allen.Dou	a37415c31b	allow user to chose which vllm's merics to display in grafana (#3393 )	2024-03-14 06:35:13 +00:00
Simon Mo	81653d9688	[Hotfix] [Debug] test_openai_server.py::test_guided_regex_completion (#3383 )	2024-03-13 17:02:21 -07:00
Zhuohan Li	eeab52a4ff	[FIX] Simpler fix for async engine running on ray (#3371 )	2024-03-13 14:18:40 -07:00
Antoni Baum	c33afd89f5	Fix lint (#3388 )	2024-03-13 13:56:49 -07:00
Terry	7e9bd08f60	Add batched RoPE kernel (#3095 )	2024-03-13 13:45:26 -07:00
Or Sharir	ae0ccb4017	Add missing kernel for CodeLlama-34B on A/H100 (no tensor parallelism) when using Multi-LoRA. (#3350 )	2024-03-13 12:18:25 -07:00
陈序	739c350c19	[Minor Fix] Use cupy-cuda11x in CUDA 11.8 build (#3256 )	2024-03-13 09:43:24 -07:00
Hui Liu	ba8dc958a3	[Minor] Fix bias in if to remove ambiguity (#3259 )	2024-03-13 09:16:55 -07:00
Ronan McGovern	e221910e77	add hf_transfer to requirements.txt (#3031 )	2024-03-12 23:33:43 -07:00
Bo-Wen Wang	b167109ba1	[Fix] Fix quantization="gptq" when using Marlin (#3319 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-03-12 22:51:42 -07:00
Woosuk Kwon	602358f8a8	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
Breno Faria	49a3c8662b	Fixes #1556 double free (#3347 )	2024-03-13 00:30:08 +00:00
Sherlock Xu	b0925b3878	docs: Add BentoML deployment doc (#3336 ) Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>	2024-03-12 10:34:30 -07:00
DAIZHENWEI	654865e21d	Support Mistral Model Inference with transformers-neuronx (#3153 )	2024-03-11 13:19:51 -07:00

1 2 3 4 5 ...

903 Commits