..
fp8
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. ( #6112 )
2024-09-11 00:38:40 -04:00
production_monitoring
[CI/Build] Pin OpenTelemetry versions and make errors clearer ( #7266 )
2024-08-20 10:02:21 -07:00
api_client.py
[bugfix] make args.stream work ( #6831 )
2024-07-27 09:07:02 +00:00
aqlm_example.py
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
cpu_offload.py
[core][model] yet another cpu offload implementation ( #6496 )
2024-07-17 20:54:35 -07:00
gguf_inference.py
[Core] Support loading GGUF model ( #5191 )
2024-08-05 17:54:23 -06:00
gradio_openai_chatbot_webserver.py
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
gradio_webserver.py
Remove deprecated parameter: concurrency_count ( #2315 )
2024-01-03 09:56:21 -08:00
llm_engine_example.py
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
logging_configuration.md
[Doc][CI/Build] Update docs and tests to use vllm serve ( #6431 )
2024-07-17 07:43:21 +00:00
lora_with_quantization_inference.py
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
multilora_inference.py
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
offline_chat_with_tools.py
[Model] Add mistral function calling format to all models loaded with "mistral" format ( #8515 )
2024-09-17 17:50:37 +00:00
offline_inference_arctic.py
[Model] Snowflake arctic model implementation ( #4652 )
2024-05-09 22:37:14 +00:00
offline_inference_audio_language.py
[Model] Add Ultravox support for multiple audio chunks ( #7963 )
2024-09-04 04:38:21 +00:00
offline_inference_chat.py
[Frontend] Batch inference for llm.chat() API ( #8648 )
2024-09-24 09:44:11 -07:00
offline_inference_distributed.py
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
offline_inference_embedding.py
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API ( #3734 )
2024-05-11 11:30:37 -07:00
offline_inference_encoder_decoder.py
[Core] Support serving encoder/decoder models ( #7258 )
2024-08-09 10:39:41 +08:00
offline_inference_mlpspeculator.py
[BugFix] Fix cuda graph for MLPSpeculator ( #5875 )
2024-06-27 04:12:10 +00:00
offline_inference_neuron_int8_quantization.py
[Neuron] Adding support for adding/ overriding neuron configuration a… ( #8062 )
2024-09-04 16:33:43 -07:00
offline_inference_neuron.py
[Neuron] Adding support for context-lenght, token-gen buckets. ( #7885 )
2024-08-29 13:58:14 -07:00
offline_inference_openai.md
[Frontend] Support embeddings in the run_batch API ( #7132 )
2024-08-09 09:48:21 -07:00
offline_inference_pixtral.py
[Misc] Update Pixtral example ( #8431 )
2024-09-12 17:31:18 -07:00
offline_inference_tpu.py
[CI/Build][TPU] Add TPU CI test ( #6277 )
2024-07-15 14:31:16 -07:00
offline_inference_vision_language_multi_image.py
[Model] Expose Phi3v num_crops as a mm_processor_kwarg ( #8658 )
2024-09-24 07:36:46 +00:00
offline_inference_vision_language.py
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
2024-09-25 13:29:32 -07:00
offline_inference_with_prefix.py
[Bugfix] Add warmup for prefix caching example ( #5235 )
2024-06-03 19:36:41 -07:00
offline_inference_with_profiler.py
[bugfix] torch profiler bug for single gpu with GPUExecutor ( #8354 )
2024-09-12 21:30:00 -07:00
offline_inference.py
[Quality] Add code formatter and linter ( #326 )
2023-07-03 11:31:55 -07:00
openai_audio_api_client.py
[Model] Add UltravoxModel and UltravoxConfig ( #7615 )
2024-08-21 22:49:39 +00:00
openai_chat_completion_client_with_tools.py
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
2024-09-04 13:18:13 -07:00
openai_chat_completion_client.py
Add example scripts to documentation ( #4225 )
2024-04-22 16:36:54 +00:00
openai_completion_client.py
lint: format all python file instead of just source code ( #2567 )
2024-01-23 15:53:06 -08:00
openai_embedding_client.py
[Bugfix]: Use float32 for base64 embedding ( #7855 )
2024-08-26 03:16:38 +00:00
openai_example_batch.jsonl
[docs] Fix typo in examples filename openi -> openai ( #4864 )
2024-05-17 00:42:17 +09:00
openai_vision_api_client.py
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
2024-09-25 13:29:32 -07:00
run_cluster.sh
[doc][distributed] doc for setting up multi-node environment ( #6529 )
2024-07-22 21:22:09 -07:00
save_sharded_state.py
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
template_alpaca.jinja
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
template_baichuan.jinja
Fix Baichuan chat template ( #3340 )
2024-03-15 21:02:12 -07:00
template_blip2.jinja
[Model] Initial support for BLIP-2 ( #5920 )
2024-07-27 11:53:07 +00:00
template_chatglm2.jinja
Add chat templates for ChatGLM ( #3418 )
2024-03-14 23:19:22 -07:00
template_chatglm.jinja
Add chat templates for ChatGLM ( #3418 )
2024-03-14 23:19:22 -07:00
template_chatml.jinja
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
template_falcon_180b.jinja
Add chat templates for Falcon ( #3420 )
2024-03-14 23:19:02 -07:00
template_falcon.jinja
Add chat templates for Falcon ( #3420 )
2024-03-14 23:19:02 -07:00
template_inkbot.jinja
Support chat template and echo for chat API ( #1756 )
2023-11-30 16:43:13 -08:00
template_llava.jinja
[Frontend] Add OpenAI Vision API Support ( #5237 )
2024-06-07 11:23:32 -07:00
tensorize_vllm_model.py
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
tool_chat_template_hermes.jinja
[Bugfix] Fix Hermes tool call chat template bug ( #8256 )
2024-09-07 10:49:01 +08:00
tool_chat_template_mistral_parallel.jinja
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
2024-09-04 13:18:13 -07:00
tool_chat_template_mistral.jinja
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
2024-09-04 13:18:13 -07:00