[Doc] Section for Multimodal Language Models (#7719)

This commit is contained in:
Roger Wang 2024-08-20 23:24:01 -07:00 committed by GitHub
parent 12e1c65bc9
commit 4506641212
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -177,51 +177,61 @@ Decoder-only Language Models
.. _supported_vlms: .. _supported_vlms:
Vision Language Models Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table:: .. list-table::
:widths: 25 25 50 5 :widths: 25 25 25 25 5
:header-rows: 1 :header-rows: 1
* - Architecture * - Architecture
- Models - Models
- Supported Modality(ies)
- Example HuggingFace Models - Example HuggingFace Models
- :ref:`LoRA <lora>` - :ref:`LoRA <lora>`
* - :code:`Blip2ForConditionalGeneration` * - :code:`Blip2ForConditionalGeneration`
- BLIP-2 - BLIP-2
- Image
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc. - :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
- -
* - :code:`ChameleonForConditionalGeneration` * - :code:`ChameleonForConditionalGeneration`
- Chameleon - Chameleon
- Image
- :code:`facebook/chameleon-7b` etc. - :code:`facebook/chameleon-7b` etc.
- -
* - :code:`FuyuForCausalLM` * - :code:`FuyuForCausalLM`
- Fuyu - Fuyu
- Image
- :code:`adept/fuyu-8b` etc. - :code:`adept/fuyu-8b` etc.
- -
* - :code:`InternVLChatModel` * - :code:`InternVLChatModel`
- InternVL2 - InternVL2
- Image
- :code:`OpenGVLab/InternVL2-4B`, :code:`OpenGVLab/InternVL2-8B`, etc. - :code:`OpenGVLab/InternVL2-4B`, :code:`OpenGVLab/InternVL2-8B`, etc.
- -
* - :code:`LlavaForConditionalGeneration` * - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5 - LLaVA-1.5
- Image
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`llava-hf/llava-1.5-13b-hf`, etc. - :code:`llava-hf/llava-1.5-7b-hf`, :code:`llava-hf/llava-1.5-13b-hf`, etc.
- -
* - :code:`LlavaNextForConditionalGeneration` * - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT - LLaVA-NeXT
- Image
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc. - :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
- -
* - :code:`PaliGemmaForConditionalGeneration` * - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma - PaliGemma
- Image
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc. - :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
- -
* - :code:`Phi3VForCausalLM` * - :code:`Phi3VForCausalLM`
- Phi-3-Vision - Phi-3-Vision
- Image
- :code:`microsoft/Phi-3-vision-128k-instruct`, etc. - :code:`microsoft/Phi-3-vision-128k-instruct`, etc.
- -
* - :code:`MiniCPMV` * - :code:`MiniCPMV`
- MiniCPM-V - MiniCPM-V
- Image
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc. - :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
- -