| .. |
|
fp8
|
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112)
|
2024-09-11 00:38:40 -04:00 |
|
production_monitoring
|
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
|
2024-08-20 10:02:21 -07:00 |
|
api_client.py
|
[bugfix] make args.stream work (#6831)
|
2024-07-27 09:07:02 +00:00 |
|
aqlm_example.py
|
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718)
|
2024-06-20 17:00:13 -06:00 |
|
cpu_offload.py
|
[core][model] yet another cpu offload implementation (#6496)
|
2024-07-17 20:54:35 -07:00 |
|
florence2_inference.py
|
[Model] Initialize Florence-2 language backbone support (#9555)
|
2024-10-23 10:42:47 +00:00 |
|
gguf_inference.py
|
[Core] Support loading GGUF model (#5191)
|
2024-08-05 17:54:23 -06:00 |
|
gradio_openai_chatbot_webserver.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
gradio_webserver.py
|
Remove deprecated parameter: concurrency_count (#2315)
|
2024-01-03 09:56:21 -08:00 |
|
llm_engine_example.py
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
logging_configuration.md
|
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
|
2024-07-17 07:43:21 +00:00 |
|
lora_with_quantization_inference.py
|
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768)
|
2024-09-24 17:08:55 -07:00 |
|
multilora_inference.py
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
offline_chat_with_tools.py
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
|
2024-09-17 17:50:37 +00:00 |
|
offline_inference_arctic.py
|
[Model] Snowflake arctic model implementation (#4652)
|
2024-05-09 22:37:14 +00:00 |
|
offline_inference_audio_language.py
|
[Model] Add Qwen2-Audio model support (#9248)
|
2024-10-23 17:54:22 +00:00 |
|
offline_inference_chat.py
|
[Frontend] Batch inference for llm.chat() API (#8648)
|
2024-09-24 09:44:11 -07:00 |
|
offline_inference_distributed.py
|
[mypy] Enable type checking for test directory (#5017)
|
2024-06-15 04:45:31 +00:00 |
|
offline_inference_embedding.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
offline_inference_encoder_decoder.py
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
offline_inference_mlpspeculator.py
|
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
|
2024-10-17 11:38:15 -05:00 |
|
offline_inference_neuron_int8_quantization.py
|
[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062)
|
2024-09-04 16:33:43 -07:00 |
|
offline_inference_neuron.py
|
[Neuron] Adding support for context-lenght, token-gen buckets. (#7885)
|
2024-08-29 13:58:14 -07:00 |
|
offline_inference_openai.md
|
[Frontend] Support embeddings in the run_batch API (#7132)
|
2024-08-09 09:48:21 -07:00 |
|
offline_inference_pixtral.py
|
[Misc] Update Pixtral example (#8431)
|
2024-09-12 17:31:18 -07:00 |
|
offline_inference_tpu.py
|
[CI/Build][TPU] Add TPU CI test (#6277)
|
2024-07-15 14:31:16 -07:00 |
|
offline_inference_vision_language_embedding.py
|
[Model] Support E5-V (#9576)
|
2024-10-23 11:35:29 +08:00 |
|
offline_inference_vision_language_multi_image.py
|
[Model] Support E5-V (#9576)
|
2024-10-23 11:35:29 +08:00 |
|
offline_inference_vision_language.py
|
[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612)
|
2024-10-23 14:05:18 +00:00 |
|
offline_inference_with_prefix.py
|
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510)
|
2024-10-18 14:30:55 -07:00 |
|
offline_inference_with_profiler.py
|
[bugfix] torch profiler bug for single gpu with GPUExecutor (#8354)
|
2024-09-12 21:30:00 -07:00 |
|
offline_inference.py
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
offline_profile.py
|
[misc] CUDA Time Layerwise Profiler (#8337)
|
2024-10-17 10:36:09 -04:00 |
|
openai_api_client_for_multimodal.py
|
[Model] Add user-configurable task for models that support both generation and embedding (#9424)
|
2024-10-18 11:31:58 -07:00 |
|
openai_chat_completion_client_with_tools.py
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
|
2024-09-04 13:18:13 -07:00 |
|
openai_chat_completion_client.py
|
Add example scripts to documentation (#4225)
|
2024-04-22 16:36:54 +00:00 |
|
openai_completion_client.py
|
lint: format all python file instead of just source code (#2567)
|
2024-01-23 15:53:06 -08:00 |
|
openai_embedding_client.py
|
[Bugfix]: Use float32 for base64 embedding (#7855)
|
2024-08-26 03:16:38 +00:00 |
|
openai_example_batch.jsonl
|
[docs] Fix typo in examples filename openi -> openai (#4864)
|
2024-05-17 00:42:17 +09:00 |
|
run_cluster.sh
|
[doc][distributed] doc for setting up multi-node environment (#6529)
|
2024-07-22 21:22:09 -07:00 |
|
save_sharded_state.py
|
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718)
|
2024-06-20 17:00:13 -06:00 |
|
template_alpaca.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_baichuan.jinja
|
Fix Baichuan chat template (#3340)
|
2024-03-15 21:02:12 -07:00 |
|
template_blip2.jinja
|
[Model] Initial support for BLIP-2 (#5920)
|
2024-07-27 11:53:07 +00:00 |
|
template_chatglm2.jinja
|
Add chat templates for ChatGLM (#3418)
|
2024-03-14 23:19:22 -07:00 |
|
template_chatglm.jinja
|
Add chat templates for ChatGLM (#3418)
|
2024-03-14 23:19:22 -07:00 |
|
template_chatml.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_falcon_180b.jinja
|
Add chat templates for Falcon (#3420)
|
2024-03-14 23:19:02 -07:00 |
|
template_falcon.jinja
|
Add chat templates for Falcon (#3420)
|
2024-03-14 23:19:02 -07:00 |
|
template_inkbot.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_llava.jinja
|
[Frontend] Add OpenAI Vision API Support (#5237)
|
2024-06-07 11:23:32 -07:00 |
|
tensorize_vllm_model.py
|
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718)
|
2024-06-20 17:00:13 -06:00 |
|
tool_chat_template_hermes.jinja
|
[Bugfix] Fix Hermes tool call chat template bug (#8256)
|
2024-09-07 10:49:01 +08:00 |
|
tool_chat_template_internlm2_tool.jinja
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
tool_chat_template_llama3.1_json.jinja
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
|
2024-09-26 17:01:42 -07:00 |
|
tool_chat_template_llama3.2_json.jinja
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
|
2024-09-26 17:01:42 -07:00 |
|
tool_chat_template_mistral_parallel.jinja
|
[Bugfix] example template should not add parallel_tool_prompt if tools is none (#9007)
|
2024-10-03 03:04:17 +00:00 |
|
tool_chat_template_mistral.jinja
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
|
2024-09-04 13:18:13 -07:00 |