[Doc] Section for Multimodal Language Models (#7719)

This commit is contained in:
Roger Wang 2024-08-20 23:24:01 -07:00 committed by GitHub
parent 12e1c65bc9
commit 4506641212
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -177,51 +177,61 @@ Decoder-only Language Models
.. _supported_vlms:
Vision Language Models
^^^^^^^^^^^^^^^^^^^^^^^
Multimodal Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:widths: 25 25 50 5
:widths: 25 25 25 25 5
:header-rows: 1
* - Architecture
- Models
- Supported Modality(ies)
- Example HuggingFace Models
- :ref:`LoRA <lora>`
* - :code:`Blip2ForConditionalGeneration`
- BLIP-2
- Image
- :code:`Salesforce/blip2-opt-2.7b`, :code:`Salesforce/blip2-opt-6.7b`, etc.
-
* - :code:`ChameleonForConditionalGeneration`
- Chameleon
- Image
- :code:`facebook/chameleon-7b` etc.
-
* - :code:`FuyuForCausalLM`
- Fuyu
- Image
- :code:`adept/fuyu-8b` etc.
-
* - :code:`InternVLChatModel`
- InternVL2
- Image
- :code:`OpenGVLab/InternVL2-4B`, :code:`OpenGVLab/InternVL2-8B`, etc.
-
* - :code:`LlavaForConditionalGeneration`
- LLaVA-1.5
- Image
- :code:`llava-hf/llava-1.5-7b-hf`, :code:`llava-hf/llava-1.5-13b-hf`, etc.
-
* - :code:`LlavaNextForConditionalGeneration`
- LLaVA-NeXT
- Image
- :code:`llava-hf/llava-v1.6-mistral-7b-hf`, :code:`llava-hf/llava-v1.6-vicuna-7b-hf`, etc.
-
* - :code:`PaliGemmaForConditionalGeneration`
- PaliGemma
- Image
- :code:`google/paligemma-3b-pt-224`, :code:`google/paligemma-3b-mix-224`, etc.
-
* - :code:`Phi3VForCausalLM`
- Phi-3-Vision
- Image
- :code:`microsoft/Phi-3-vision-128k-instruct`, etc.
-
* - :code:`MiniCPMV`
- MiniCPM-V
- Image
- :code:`openbmb/MiniCPM-V-2` (see note), :code:`openbmb/MiniCPM-Llama3-V-2_5`, :code:`openbmb/MiniCPM-V-2_6`, etc.
-