vllm/examples
2024-03-14 23:19:02 -07:00
..
production_monitoring allow user to chose which vllm's merics to display in grafana (#3393) 2024-03-14 06:35:13 +00:00
api_client.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
gradio_openai_chatbot_webserver.py Add gradio chatbot for openai webserver (#2307) 2024-01-11 19:45:56 -08:00
gradio_webserver.py Remove deprecated parameter: concurrency_count (#2315) 2024-01-03 09:56:21 -08:00
llm_engine_example.py Refactor LLMEngine demo script for clarity and modularity (#1413) 2023-10-30 09:14:37 -07:00
multilora_inference.py multi-LoRA as extra models in OpenAI server (#2775) 2024-02-17 12:00:48 -08:00
offline_inference_distributed.py Add one example to run batch inference distributed on Ray (#2696) 2024-02-02 15:41:42 -08:00
offline_inference_neuron.py Support Mistral Model Inference with transformers-neuronx (#3153) 2024-03-11 13:19:51 -07:00
offline_inference_with_prefix.py Add Automatic Prefix Caching (#2762) 2024-03-02 00:50:01 -08:00
offline_inference.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
openai_chatcompletion_client.py lint: format all python file instead of just source code (#2567) 2024-01-23 15:53:06 -08:00
openai_completion_client.py lint: format all python file instead of just source code (#2567) 2024-01-23 15:53:06 -08:00
template_alpaca.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
template_baichuan.jinja Add baichuan chat template jinjia file (#2390) 2024-01-09 09:13:02 -08:00
template_chatml.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
template_falcon_180b.jinja Add chat templates for Falcon (#3420) 2024-03-14 23:19:02 -07:00
template_falcon.jinja Add chat templates for Falcon (#3420) 2024-03-14 23:19:02 -07:00
template_inkbot.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00