vllm/examples at 7a64d24aad69e4d2548aa0bf528d9fe63428ab01 - vllm

Cyrus Leung 7a64d24aad [Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
..
fp8	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
production_monitoring	[Doc]Replace deprecated flag in readme (#4526 )	2024-05-29 22:26:33 +00:00
api_client.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
aqlm_example.py	AQLM CUDA support (#3287 )	2024-04-23 13:59:33 -04:00
gradio_openai_chatbot_webserver.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
gradio_webserver.py	Remove deprecated parameter: concurrency_count (#2315 )	2024-01-03 09:56:21 -08:00
llava_example.py	[Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
llm_engine_example.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
logging_configuration.md	[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273 )	2024-05-01 17:34:40 -07:00
lora_with_quantization_inference.py	[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776 )	2024-06-01 14:51:10 -06:00
multilora_inference.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
offline_inference_arctic.py	[Model] Snowflake arctic model implementation (#4652 )	2024-05-09 22:37:14 +00:00
offline_inference_distributed.py	[Doc] Update Ray Data distributed offline inference example (#4871 )	2024-05-17 10:52:11 -07:00
offline_inference_embedding.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
offline_inference_neuron.py	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
offline_inference_openai.md	[Frontend] Support OpenAI batch file format (#4794 )	2024-05-15 19:13:36 -04:00
offline_inference_with_prefix.py	[Bugfix]: Fix issues related to prefix caching example (#5177 ) (#5180 )	2024-06-01 15:53:52 -07:00
offline_inference.py	[Quality] Add code formatter and linter (#326 )	2023-07-03 11:31:55 -07:00
openai_chat_completion_client.py	Add example scripts to documentation (#4225 )	2024-04-22 16:36:54 +00:00
openai_completion_client.py	lint: format all python file instead of just source code (#2567 )	2024-01-23 15:53:06 -08:00
openai_embedding_client.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
openai_example_batch.jsonl	[docs] Fix typo in examples filename openi -> openai (#4864 )	2024-05-17 00:42:17 +09:00
save_sharded_state.py	[Core] Implement sharded state loader (#4690 )	2024-05-15 22:11:54 -07:00
template_alpaca.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_baichuan.jinja	Fix Baichuan chat template (#3340 )	2024-03-15 21:02:12 -07:00
template_chatglm2.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatglm.jinja	Add chat templates for ChatGLM (#3418 )	2024-03-14 23:19:22 -07:00
template_chatml.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
template_falcon_180b.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_falcon.jinja	Add chat templates for Falcon (#3420 )	2024-03-14 23:19:02 -07:00
template_inkbot.jinja	Support chat template and `echo` for chat API (#1756 )	2023-11-30 16:43:13 -08:00
tensorize_vllm_model.py	[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208 )	2024-05-13 14:57:07 -07:00