vllm/examples at d03d64fd2e22f1a48e7b78c66d7644e6b6230fb7 - vllm

History

Adrian Abeyta 2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>		2024-04-03 14:15:55 -07:00
..
fp8	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
production_monitoring	allow user to chose which vllm's merics to display in grafana (#3393 )	2024-03-14 06:35:13 +00:00
api_client.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
gradio_openai_chatbot_webserver.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
gradio_webserver.py	Remove deprecated parameter: concurrency_count (#2315 )	2024-01-03 09:56:21 -08:00
llava_example.py	[CI] Add test case to run examples scripts (#3638 )	2024-03-28 14:36:10 -07:00
llm_engine_example.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
multilora_inference.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
offline_inference_distributed.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
offline_inference_neuron.py	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
offline_inference_with_prefix.py	[Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703 )	2024-03-28 16:26:30 -07:00
offline_inference.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
openai_chatcompletion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
openai_completion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
template_alpaca.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_baichuan.jinja	Fix Baichuan chat template (#3340 )	2024-03-15 21:02:12 -07:00
template_chatglm2.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatglm.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatml.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_falcon_180b.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_falcon.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_inkbot.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00