vllm/examples at 8f36444c4f9a55669bcb64e20b5588c0dd72bd93 - vllm

History

jvmncs 8f36444c4f multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models		2024-02-17 12:00:48 -08:00
..
production_monitoring	Refactor Prometheus and Add Request Level Metrics (#2316 )	2024-01-31 14:58:07 -08:00
api_client.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
gradio_openai_chatbot_webserver.py	Add gradio chatbot for openai webserver (#2307 )	2024-01-11 19:45:56 -08:00
gradio_webserver.py	Remove deprecated parameter: concurrency_count (#2315 )	2024-01-03 09:56:21 -08:00
llm_engine_example.py	Refactor LLMEngine demo script for clarity and modularity (#1413 )	2023-10-30 09:14:37 -07:00
multilora_inference.py	multi-LoRA as extra models in OpenAI server (#2775 )	2024-02-17 12:00:48 -08:00
offline_inference_distributed.py	Add one example to run batch inference distributed on Ray (#2696 )	2024-02-02 15:41:42 -08:00
offline_inference_with_prefix.py	Minor fix in prefill cache example (#2494 )	2024-01-18 09:40:34 -08:00
offline_inference.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
openai_chatcompletion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
openai_completion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
template_alpaca.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_baichuan.jinja	Add baichuan chat template jinjia file (#2390 )	2024-01-09 09:13:02 -08:00
template_chatml.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_inkbot.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00