vllm/examples at 4c922709b65ff5c0652ae36b93047016bdeaace8 - vllm

History

Sage Moore ce4f5a29fb Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>		2024-03-02 00:50:01 -08:00
..
production_monitoring	Refactor Prometheus and Add Request Level Metrics (#2316 )	2024-01-31 14:58:07 -08:00
api_client.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
gradio_openai_chatbot_webserver.py	Add gradio chatbot for openai webserver (#2307 )	2024-01-11 19:45:56 -08:00
gradio_webserver.py	Remove deprecated parameter: concurrency_count (#2315 )	2024-01-03 09:56:21 -08:00
llm_engine_example.py	Refactor LLMEngine demo script for clarity and modularity (#1413 )	2023-10-30 09:14:37 -07:00
multilora_inference.py	multi-LoRA as extra models in OpenAI server (#2775 )	2024-02-17 12:00:48 -08:00
offline_inference_distributed.py	Add one example to run batch inference distributed on Ray (#2696 )	2024-02-02 15:41:42 -08:00
offline_inference_neuron.py	[Neuron] Support inference with transformers-neuronx (#2569 )	2024-02-28 09:34:34 -08:00
offline_inference_with_prefix.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
offline_inference.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
openai_chatcompletion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
openai_completion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
template_alpaca.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_baichuan.jinja	Add baichuan chat template jinjia file (#2390 )	2024-01-09 09:13:02 -08:00
template_chatml.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_inkbot.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00