naver-ai/Visual-Style-Prompting

The HuggingFace Demo has been ignoring the images I've uploaded

Opened this issue · 21 comments

same, wont take an uploaded image.

Yes, please add the functionality for the user to specify their own style image without having to modify config files.
Should be simple, pick image, type prompt, generate.
The first thing users want after running the examples is "that's cool, how can I use my own style image now"?

Please consider making it possible for users to be able to use their own images as a style, and it be simple to do so, many thanks. I really like this concept though, it's great.
Thanks for your contribution.

To accurately reflect the style of the user image, a description of that image is necessary. Some users may struggle to write effective descriptions, we have not included this aspect in the demo.

We will update the demo code to support this by utilizing BLIP2.

To accurately reflect the style of the user image, a description of that image is necessary. Some users may struggle to write effective descriptions, we have not included this aspect in the demo.

We will update the demo code to support this by utilizing BLIP2.

That would work. User picks one of their images, BLIP2 captions it, user should get an option to modify the detected caption if need be, then the user image can be used to style any other image.

This will be very helpful. Thank you. Looking forward to working with my own images.

To accurately reflect the style of the user image, a description of that image is necessary. Some users may struggle to write effective descriptions, we have not included this aspect in the demo.

We will update the demo code to support this by utilizing BLIP2.

  • There is an issue about HF gpu, so HF is currently fixing it.
  • For this reason, the features for user image styles have been implemented, but not executed in the demo.
  • In now, Try vsp_real_script.py
  • There is an issue about HF gpu, so HF is currently fixing it.

    • For this reason, the features for user image styles have been implemented, but not executed in the demo.

    • In now, Try vsp_real_script.py

Can you make an updated app.py for local running? I am trying to do this all local on Windows, so it doesn't matter if it does not run as a HF online demo.

@dhmiller123 @SoftologyPro
In local, you can try with vsp_real_script.py

@dhmiller123 @SoftologyPro
In local, you can try with vsp_real_script.py

I understand, but if you updated the gradio UI with that functionality it would make it easier for all users.

We have recently updated the demo to reflect user images. However, due to an issue with the GPU provided by Hugging Face (HF), the functionality is not performing as expected. We have no choice but to wait until HF resolves this issue.

OK, I understand that too. But, I don't want to run via huggingface. I want to run your gradio demo locally under Windows. If you do have a version of the gradio app.py that works locally then please do share. The only version of app.py I have is from before which has now been removed from your repo.

ie the attached version app.py (renamed app.txt as py files do not seem to be attachable). Running locally. That should get around any huggingface limitations?

app.txt
Screenshot 2024-04-01 183632

demo is working now.

OK, when I try and run the HF demo with my own style image I get GPU timeouts. Can you provide a working version of app.py to run local? This is what I tried...

git clone https://huggingface.co/spaces/naver-ai/VisualStylePrompting
In app.py I had to remark the first line import spaces and the other @spaces.GPU line.
Then running app.py opens the UI

I select my own style image, set a prompt, set the outputs to 1 and click Submit.
Gives these errors (same as the other issue I raised wiith vsp_real_script.py) #7

Traceback (most recent call last):
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\gradio\queueing.py", line 501, in call_prediction
    output = await route_utils.call_process_api(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\gradio\route_utils.py", line 253, in call_process_api
    output = await app.get_blocks().process_api(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\gradio\blocks.py", line 1695, in process_api
    result = await self.call_function(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\gradio\blocks.py", line 1235, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\anyio\_backends\_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\gradio\utils.py", line 692, in wrapper
    response = f(*args, **kwargs)
  File "<path to local clone>Visual Style Prompting\app.py", line 156, in style_fn
    ref_prompt = blip_inf_prompt(origin_real_img)
  File "<path to local clone>Visual Style Prompting\app.py", line 77, in blip_inf_prompt
    generated_ids = blip_model.generate(**inputs)
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\transformers\models\blip_2\modeling_blip_2.py", line 1830, in generate
    outputs = self.language_model.generate(
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\transformers\generation\utils.py", line 1466, in generate
    self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
  File "<path to local clone>venv\voc_visualstyleprompting\lib\site-packages\transformers\generation\utils.py", line 1186, in _validate_generated_length
    raise ValueError(
ValueError: Input length of input_ids is 0, but `max_length` is set to -13. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

If I then click the watercolor horse/tiger example and click Submit it works.

If I then select my own style image again and click Submit it does not crash, but still uses the previous watercolor style and ignores my style image.

Screenshot 2024-04-03 095343

OK, for those wanting to run this locally, I finally got it working after trying various package versions until these worked.

python -m pip install --upgrade pip
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts wheel==0.41.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts diffusers==0.27.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts accelerate==0.28.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts einops==0.7.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts kornia==0.7.2
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts gradio==4.25.0
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts transformers==4.39.3
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts opencv-python==4.9.0.80
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts xformers==0.0.25 --index-url https://download.pytorch.org/whl/cu118
pip uninstall -y torch
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.2.1+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

https://softologyblog.wordpress.com/2023/10/10/a-plea-to-all-python-developers/

Running locally under Windows 11 on a 4090.
Screenshot 2024-04-03 190125

To accurately reflect the style of the user image, a description of that image is necessary. Some users may struggle to write effective descriptions, we have not included this aspect in the demo.
We will update the demo code to support this by utilizing BLIP2.

I think BLIP may also struggle to write an effective description too? Would it help to show the detected caption and allow the user to edit it before use? When an example is clicked, show the caption used for those too.

Here are some "failed" results that may help to have a better caption text for the style images?

Do you think these results are due to the caption or just a bad style image choice?

The broccoli image was BLIP captioned "broccoli is a vegetable that is very popular". Would a better prompt help get a better styled result? Maybe just broccoli.

The wave image was captioned. "a large wave breaking on the ocean"

Those 2 and the tiger above are not as "clean" as the example results. For the tiger above I expected textures that matched the style image. Would a better caption help there?

Screenshot 2024-04-04 110536 Screenshot 2024-04-04 110849