/LMM-Engines

Primary LanguagePythonApache License 2.0Apache-2.0

LMM-Engines

Installation

  • Install the requirements
pip install -e .
pip install flash-attn --no-build-isolation # optional, for faster inference
  • for connecting to Wildvision Arena, you need to install bore
bash install_bore.sh
  • some models may require additional dependencies, see the top of setup.py for details. To install extra dependencies for a specific model, you can run
pip install -e .[cogvlm2-video] # for cogvlm2-video

(Note: the extra dependencies for different models might conflict with each other, so you should better create a new virtual environment for each model.)

Usage

Local testing

python -m lmm_engines.huggingface.model.dummy_image_model
python -m lmm_engines.huggingface.model.dummy_video_model
# python -m lmm_engines.huggingface.model.model_tinyllava # example

Connect to Wildvision Arena and be one arena competitor

First run bash install_bore.sh once to install bore.

bash start_worker_on_arena.sh ${model_name} ${model_port} ${num_gpu}
# Example
bash start_worker_on_arena.sh dummy_image_model 41411 1

Then your worker shall be registered to the arena. You can check it by visiting 🤗 WildVision/vision-arena

See ## Controbute a model section for how to contribute your own model.

Start a new worker for local inference

CUDA_VISIBLE_DEVICES=0 python -m lmm_engines.huggingface.model_worker --model-path dummy_image_model --port 31004 --worker http://127.0.0.1:31004 --host=127.0.0.1 --no-register

Then call the worker

from lmm_engines import get_call_worker_func
call_worker_func = get_call_worker_func(
    worker_addrs=["http://127.0.0.1:31004"],
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
call_worker_func(test_messages, **generation_kwargs)

Or you can start a new worker automatically, fusing the above two steps all in one. model worker will close automatically after the python script ends.

from lmm_engines import get_call_worker_func
# start a new worker
call_worker_func = get_call_worker_func(
    model_name="dummy_image_model", # 
    engine="huggingface",
    num_workers=1,
    num_gpu_per_worker=1,
    dtype="float16",
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
# call the worker
print(call_worker_func(test_messages, **generation_kwargs))
  • output cache set use_cache=True to enable output cache. The cache will be stored in ~/lmm_engines/generation_cache/{model_name}.jsonl by default.

Controbute a model

(Note: we don't care the internal details of these 5 functions, as long as it can receive params and return the expected results as specified in the function signature.)

More details to see lmm_engines/huggingface/README.md

TODO

Transfering models from old arena codes into lmm-engines

Citation

If you found this repository useful, please consider cite our paper and resources:

@article{lu2024wildvision,
  title={WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences},
  author={Lu, Yujie and Jiang, Dongfu and Chen, Wenhu and Wang, William Yang and Choi, Yejin and Lin, Bill Yuchen},
  publisher={NeurIPS},
  year={2024}
}
@misc{yujie2024wildvisionarena,
    title={WildVision Arena: Benchmarking Multimodal LLMs in the Wild},
    url={https://huggingface.co/spaces/WildVision/vision-arena/},
    author={Lu, Yujie and Jiang, Dongfu and Chen, Hui and Ma, Yingzi and Gu, Jing and Xiao, Chaowei and Chen, Wenhu and Wang, William and Choi, Yejin and Lin, Bill Yuchen},
    year={2024}
}
@misc{yujie2024wildvisionv2,
    title={WildVision Data and Model},
    url={https://huggingface.co/WildVision},
    author={Lu, Yujie* and Jiang, Dongfu* and Chen, Hui* and Fu, Xingyu and Ma, Yingzi and Gu, Jing and Saxon, Michael and Xiao, Chaowei and Chen, Wenhu and Choi, Yejin and Lin, Bill Yuchen and Eckstein, Miguel and Wang, William},
    year={2024}
}