LMM-Engines

Installation

Install the requirements

pip install -e .
pip install flash-attn --no-build-isolation # optional, for faster inference

for connecting to Wildvision Arena, you need to install bore

bash install_bore.sh

some models may require additional dependencies, see the top of setup.py for details. To install extra dependencies for a specific model, you can run

pip install -e .[cogvlm2-video] # for cogvlm2-video

(Note: the extra dependencies for different models might conflict with each other, so you should better create a new virtual environment for each model.)

Usage

Local testing

python -m lmm_engines.huggingface.model.dummy_image_model
python -m lmm_engines.huggingface.model.dummy_video_model
# python -m lmm_engines.huggingface.model.model_tinyllava # example

Connect to Wildvision Arena and be one arena competitor

First run bash install_bore.sh once to install bore.

bash start_worker_on_arena.sh ${model_name} ${model_port} ${num_gpu}
# Example
bash start_worker_on_arena.sh dummy_image_model 41411 1

Then your worker shall be registered to the arena. You can check it by visiting 🤗 WildVision/vision-arena

See ## Controbute a model section for how to contribute your own model.

Start a new worker for local inference

CUDA_VISIBLE_DEVICES=0 python -m lmm_engines.huggingface.model_worker --model-path dummy_image_model --port 31004 --worker http://127.0.0.1:31004 --host=127.0.0.1 --no-register

Then call the worker

from lmm_engines import get_call_worker_func
call_worker_func = get_call_worker_func(
    worker_addrs=["http://127.0.0.1:31004"],
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
call_worker_func(test_messages, **generation_kwargs)

Or you can start a new worker automatically, fusing the above two steps all in one. model worker will close automatically after the python script ends.

from lmm_engines import get_call_worker_func
# start a new worker
call_worker_func = get_call_worker_func(
    model_name="dummy_image_model", # 
    engine="huggingface",
    num_workers=1,
    num_gpu_per_worker=1,
    dtype="float16",
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
# call the worker
print(call_worker_func(test_messages, **generation_kwargs))

output cache set use_cache=True to enable output cache. The cache will be stored in ~/lmm_engines/generation_cache/{model_name}.jsonl by default.

Controbute a model

If you are contributing a new image model, copy the lmm_engines/huggingface/model/dummy_image_model.py and modify it. (the type in get_info should be image)
If you are contributing a new video model, copy the lmm_engines/huggingface/model/dummy_video_model.py and modify it. (the type in get_info should be video)
If the model is both an image and video model, set the type in get_info to image;video. (example implementation: lmm_engines/huggingface/model/model_qwen2vl.py)
Four functions to implement:
- load_model(self, model_path: str, device: str, from_pretrained_kwargs: Dict[str, Any]) -> None
- generate(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]
- generate_image(self, image: Image.Image, **kwargs) -> Image.Image
- generate_video(self, video: List[Image.Image], **kwargs) -> List[Image.Image]
- get_info(self) -> Dict[str, Any]
test the model adapter: see lmm_engines/huggingface/README.md
add registration at the bottom of lmm_engines/huggingface/model/model_adapter.py
add model-specific dependencies in ./lmm_engines/model_requirements/{model_name}.txt
Connect to Wildvision Arena and be one arena competitor: bash start_worker_on_arena.sh ${model_name} ${model_port}

(Note: we don't care the internal details of these 5 functions, as long as it can receive params and return the expected results as specified in the function signature.)

More details to see lmm_engines/huggingface/README.md

TODO

Transfering models from old arena codes into lmm-engines

Citation

If you found this repository useful, please consider cite our paper and resources:

@article{lu2024wildvision,
  title={WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences},
  author={Lu, Yujie and Jiang, Dongfu and Chen, Wenhu and Wang, William Yang and Choi, Yejin and Lin, Bill Yuchen},
  publisher={NeurIPS},
  year={2024}
}
@misc{yujie2024wildvisionarena,
    title={WildVision Arena: Benchmarking Multimodal LLMs in the Wild},
    url={https://huggingface.co/spaces/WildVision/vision-arena/},
    author={Lu, Yujie and Jiang, Dongfu and Chen, Hui and Ma, Yingzi and Gu, Jing and Xiao, Chaowei and Chen, Wenhu and Wang, William and Choi, Yejin and Lin, Bill Yuchen},
    year={2024}
}
@misc{yujie2024wildvisionv2,
    title={WildVision Data and Model},
    url={https://huggingface.co/WildVision},
    author={Lu, Yujie* and Jiang, Dongfu* and Chen, Hui* and Fu, Xingyu and Ma, Yingzi and Gu, Jing and Saxon, Michael and Xiao, Chaowei and Chen, Wenhu and Choi, Yejin and Lin, Bill Yuchen and Eckstein, Miguel and Wang, William},
    year={2024}
}

chchenhui/LMM-Engines