squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Cyrus Leung	8c025fa703	[Frontend] Factor out chat message parsing (#7055 )	2024-08-02 21:31:27 -07:00
Robert Shaw	ed812a73fa	[ Frontend ] Multiprocessing for OpenAI Server with `zeromq` (#6883 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-08-02 18:27:28 -07:00
zifeitong	3c10591ef2	[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954 )	2024-07-31 21:13:34 -07:00
Simon Mo	7eb0cb4a14	Revert "[Frontend] Factor out code for running uvicorn" (#7012 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-07-31 16:34:26 -07:00
Fei	c0644cf9ce	[Bugfix] fix logit processor excceed vocab size issue (#6927 )	2024-07-31 16:16:01 +08:00
Cyrus Leung	da1f7cc12a	[mypy] Enable following imports for some directories (#6681 )	2024-07-31 10:38:03 +08:00
Nick Hill	9f69d8245a	[Frontend] New `allowed_token_ids` decoding request parameter (#6753 )	2024-07-29 23:37:27 +00:00
Isotr0py	7cbd9ec7a9	[Model] Initialize support for InternVL2 series models (#6514 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-29 10:16:30 +00:00
Cyrus Leung	981b0d5673	[Frontend] Factor out code for running uvicorn (#6828 )	2024-07-27 09:58:25 +08:00
Alphi	b75e314fff	[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-25 09:42:49 -07:00
Evan Z. Liu	5689e256ba	[Frontend] Represent tokens with identifiable strings (#6626 )	2024-07-25 09:51:00 +08:00
Daniele	ee812580f7	[Frontend] split run_server into build_server and run_server (#6740 )	2024-07-24 10:36:04 -07:00
LF Marques	545146349c	Adding f-string to validation error which is missing (#6748 )	2024-07-24 08:55:53 -07:00
Yehoshua Cohen	58f53034ad	[Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652 )	2024-07-23 11:41:55 -07:00
Roger Wang	22fa2e35cb	[VLM][Model] Support image input for Chameleon (#6633 )	2024-07-22 23:50:48 -07:00
Jiaxin Shan	42c7f66a38	[Core] Support dynamically loading Lora adapter from HuggingFace (#6234 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-22 15:42:40 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Cyrus Leung	d7f4178dd9	[Frontend] Move chat utils (#6602 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-21 08:38:17 +08:00
Daniele	51f8aa90ad	[Bugfix][Frontend] remove duplicate init logger (#6581 )	2024-07-19 10:16:27 -07:00
Cyrus Leung	6366efc67b	[Bugfix][Frontend] Fix missing `/metrics` endpoint (#6463 )	2024-07-19 03:55:13 +00:00
Nick Hill	e2fbaee725	[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-18 15:13:30 +08:00
youkaichao	1c27d25fb5	[core][model] yet another cpu offload implementation (#6496 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-17 20:54:35 -07:00
sasha0552	7a3d2a5b95	[Frontend] Support for chat completions input in the tokenize endpoint (#5923 )	2024-07-16 20:18:09 +08:00
Joe	d92b3c5cde	[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419 )	2024-07-15 18:54:15 -07:00
zifeitong	b47008b4d2	[BugFix] BatchResponseData body should be optional (#6345 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-15 04:06:09 +00:00
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
Swapnil Parekh	4d6ada947c	[CORE] Adding support for insertion of soft-tuned prompts (#4645 ) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-09 13:26:36 -07:00
kczimm	16620f439d	do not exclude `object` field in CompletionStreamResponse (#6196 )	2024-07-08 10:32:57 +08:00
youkaichao	3b08fe2b13	[misc][frontend] log all available endpoints (#6195 ) Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2024-07-07 15:11:12 -07:00
jvlunteren	f1e15da6fe	[Frontend] Continuous usage stats in OpenAI completion API (#5742 )	2024-07-05 10:37:09 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Roger Wang	3a86b54fb0	[VLM][Frontend] Proper Image Prompt Formatting from OpenAI API (#6091 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-02 23:41:23 -07:00
Cyrus Leung	9831aec49f	[Core] Dynamic image size support for VLMs (#5276 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-07-02 20:34:00 -07:00
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
danieljannai21	2c37540aa6	[Frontend] Add template related params to request (#5709 )	2024-07-01 23:01:57 -07:00
Alexander Matveev	3476ed0809	[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602 )	2024-07-01 20:10:37 -07:00
llmpros	c6c240aa0a	[Frontend]: Support base64 embedding (#5935 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-06-30 23:53:00 +08:00
Robert Shaw	6a62cb82cc	[Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError (#5963 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic>	2024-06-28 17:46:30 -04:00
sasha0552	c54269d967	[Frontend] Add tokenize/detokenize endpoints (#5054 )	2024-06-26 16:54:22 +00:00
zifeitong	ff9ddbceee	[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756 )	2024-06-22 03:33:12 +00:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
zifeitong	26e1188e51	[Fix] Use utf-8 encoding in entrypoints/openai/run_batch.py (#5606 )	2024-06-17 23:16:10 +00:00
zifeitong	3ce2c050dd	[Fix] Correct OpenAI batch response format (#5554 )	2024-06-15 16:57:54 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Cyrus Leung	03dccc886e	[Misc] Add vLLM version getter to utils (#5098 )	2024-06-13 11:21:39 -07:00
Michael Goin	7d19de2e9c	[Frontend] Add "input speed" to tqdm postfix alongside output speed (#5425 )	2024-06-12 18:42:12 -04:00
Cyrus Leung	640052b069	[Bugfix][Frontend] Cleanup "fix chat logprobs" (#5026 )	2024-06-10 22:36:46 -07:00
maor-ps	351d5e7b82	[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (#5312 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-11 10:30:31 +08:00

1 2 3 4 5

210 Commits