Models to port to MLX-VLM

Question

Models to port to MLX-VLM

Blaizzy opened this issue 7 months ago · 20 comments

Answer 1 · 2024-06-22T15:53:36.000Z

Next release of Llava-Next

TODO:
update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

Answer 2 · 2024-07-31T18:26:28.000Z

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2
I am now just reading the code, and trying to free some time for the conversion routine.

Answer 3 · 2024-08-08T18:18:08.000Z

https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

Answer 4 · 2024-08-08T20:27:45.000Z

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

Answer 5 · 2024-08-27T17:41:55.000Z

MiniCPM-V v2.6

Answer 6 · 2024-08-27T17:42:30.000Z

MiniCPM-V v2.6

Answer 7 · 2024-09-07T10:44:22.000Z

Do you have a link to Florence-2?

Answer 8 · 2024-09-10T05:54:38.000Z

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

Answer 9 · 2024-09-10T12:17:37.000Z

Hey @ChristianWeyer
Its mostly up-to-date, just missing qwen2-vl

Answer 10 · 2024-09-10T12:18:38.000Z

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

Answer 11 · 2024-09-10T13:50:00.000Z

[x] Phi-3-vision

Thanks!
I guess Phi-3-vision includes 3.5?

Answer 12 · 2024-09-10T13:59:50.000Z

Yes, they have the same arch so there are no changes needed :)

Answer 13 · 2024-09-20T15:27:37.000Z

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

Answer 14 · 2024-09-21T22:27:26.000Z

Qwen2-VL-72B would be amazing!

Answer 15 · 2024-09-29T21:28:25.000Z

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

Answer 16 · 2024-09-30T00:13:52.000Z

Yep they just merged Qwen2-vl support this weekend.

Answer 17 · 2024-10-02T00:18:09.000Z

Molmo please

Answer 18 · 2024-10-02T17:41:21.000Z

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

Answer 19 · 2024-10-02T19:03:08.000Z

Yap, that's a pretty awesome model!
It's on my radar because we can run it in 4bit quant