vllm/examples
James Fleming 2b7949c1c2
AQLM CUDA support (#3287)
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-04-23 13:59:33 -04:00
..
fp8 [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
production_monitoring allow user to chose which vllm's merics to display in grafana (#3393) 2024-03-14 06:35:13 +00:00
api_client.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
aqlm_example.py AQLM CUDA support (#3287) 2024-04-23 13:59:33 -04:00
gradio_openai_chatbot_webserver.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
gradio_webserver.py Remove deprecated parameter: concurrency_count (#2315) 2024-01-03 09:56:21 -08:00
llava_example.py [CI] Add test case to run examples scripts (#3638) 2024-03-28 14:36:10 -07:00
llm_engine_example.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
multilora_inference.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
offline_inference_distributed.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
offline_inference_neuron.py [Hardware][Neuron] Refactor neuron support (#3471) 2024-03-22 01:22:17 +00:00
offline_inference_with_prefix.py [Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703) 2024-03-28 16:26:30 -07:00
offline_inference.py [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
openai_chat_completion_client.py Add example scripts to documentation (#4225) 2024-04-22 16:36:54 +00:00
openai_completion_client.py lint: format all python file instead of just source code (#2567) 2024-01-23 15:53:06 -08:00
template_alpaca.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
template_baichuan.jinja Fix Baichuan chat template (#3340) 2024-03-15 21:02:12 -07:00
template_chatglm2.jinja Add chat templates for ChatGLM (#3418) 2024-03-14 23:19:22 -07:00
template_chatglm.jinja Add chat templates for ChatGLM (#3418) 2024-03-14 23:19:22 -07:00
template_chatml.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
template_falcon_180b.jinja Add chat templates for Falcon (#3420) 2024-03-14 23:19:02 -07:00
template_falcon.jinja Add chat templates for Falcon (#3420) 2024-03-14 23:19:02 -07:00
template_inkbot.jinja Support chat template and echo for chat API (#1756) 2023-11-30 16:43:13 -08:00
tensorize_vllm_model.py [Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00