facebookresearch/chameleon

troubleshooting chameleon viewer

jpfraneto opened this issue · 17 comments

any ideas on why this could be happening? i followed all the instructions from the repo and downloaded the 7b model

image

this is what i see on my console:

image

and if i go to http://0.0.0.0:7102/

i get back

{"detail":"Not Found"}

miniviewer works btw

image

Hi @jpfraneto, thanks for giving the viewer a try and hopefully I can help you get it working.

Re: the detail not found page at http://0.0.0.0:7102/, this is working as expected since this is the FastAPI endpoint. You could confirm this by checking that http://0.0.0.0:7102/docs returns API doc pages.

I haven't been able to reproduce your first error, so I'll explain a couple things then have a couple questions to get more information to help debug.

The viewer generally works by:

  1. Opening the react/vite app at localhost:7654, it looks like this is working correctly.
  2. The web app will open a websocket to ws://0.0.0.0:7102. It looks like this is failing, based on the web console logs. However, it does look like the web API is running since you can (a) reach http://0.0.0.0:7102 and see the detail not found page and (b) the docker logs on the left show uvicorn running.

I don't see any indication on why the websocket connection is failing, so could you attach the logs for the following, after stopping everything else (so stop miniviewer, current viewer, etc)?

  1. The docker compose logs after running docker-compose up --build
  2. Open the web viewer, and copy/screenshot the page and logs after opening the page. If the "Connection" and "UI" circles are red, that should be enough, if its green, try generating something.
  3. Go back to the docker logs, and copy/paste any new entries at the end.

I'm hoping to get a little more information on why the websocket init is failing to get a better sense on how to fix it. Thanks!

I also meet the same question as @jpfraneto.

the docker compose build logs
image
7102/docs
image

VQModel loaded from /root/wangjianqiang/meta/meta-chameleon-30b/tokenizer/vqgan.ckpt

  • Serving Flask app 'chameleon.miniviewer.miniviewer'
  • Debug mode: off
    WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on http://127.0.0.1:5000
    为什么我这里显示的是127.0.0.1,在哪里可以改成0.0.0.0

Thanks for the extra information @WangJianQ-cmd and @yyyouy, unfortunately i don't see anything that is helping me troubleshoot and/or reproduce the issue on my end.

I might have missed it, but what I don't see in the logs you've three posted are the docker logs after opening the web ui at http://localhost:7654/.

Where @jpfraneto's web UI seems to break, is that the websocket connection to ws://0.0.0.0:7102/ws/chameleon/v2/{client_id} (where client_id can be any alphanumeric string) is not bootstrapping correctly. But I don't see any logs suggesting why that happens. For example, on my end, these are what the logs look like showing a successful connection
image

I'm looking to confirm one of the two:

  1. That the logs after accessing show a useful error to debug from OR
  2. Confirm that the websocket backend is never hit, which suggests that the frontend for some reason either can't reach the backend or is trying to access the wrong address.

Could one/multiple of @WangJianQ-cmd @yyyouy @jpfraneto post the docker logs after opening the web UI, along with the javascript console debug logs for the UI (not the docs page)? Thanks!

FWIW - I had the same problem using docker in windows, but I was not able to browse to 0.0.0.0:7102 at all.
In file chameleon/viewer/frontend/src/Config.ts, I changed 0.0.0.0 to localhost and then it worked.

FWIW - 我在 Windows 中使用 docker 时遇到了同样的问题,但我根本无法浏览到 0.0.0.0:7102。 在文件 chameleon/viewer/frontend/src/Config.ts 中,我将 0.0.0.0 更改为 localhost,然后它就可以正常工作了。

that works

@EntilZha sorry for taking so long, i was away from my computer.

first, i tried @Kinetix-JH's solution and it helped me move past the problem that i had before.

only text generation works

image

buit when i try to use the ui with an image pasted on the textarea i see this:

image

this is the console after trying to have it explain the image:

image


  1. any ideas on how to fix this?
  2. is it possible to access this "image interpretation" capabilities over an api request? im running a nodejs server on this same machine where i need to programatically understand images, and would like to start using chameleon for that.

thank you for your help. this is awesome.

That's great to hear. We'll add that in the docs/faq somewhere as a common solution. I'm guessing that whether localhost/0.0.0.0 works is specific to being on linux/mac/windows, although I'm not totally clear the exact reason.

On your question about images:

  1. There is no image generation capability unfortunately, see #11
  2. The model output given your prompt is working as intended, safety tuning often forces prompts that interact with input images to be more specific (can see huggingface/transformers#31534 (comment)).

On your other question, do you mean you'd like to call the model as an API? If so, the short answer is yes. A few things to note that would help to know:

  1. The model API is implemented here via websockets (as opposed to regular HTTP): https://github.com/facebookresearch/chameleon/blob/main/chameleon/viewer/backend/models/service.py#L118
  2. We implemented this as a websockets service instead of regular HTTP so that the model could incrementally output content rather than wait for all of it to render at once.
  3. If don't need content streaming, you could instead wrap https://github.com/facebookresearch/chameleon/blob/main/chameleon/viewer/backend/models/service.py#L194 (ie, generate_text_streaming) in your own API endpoint (ie, one like the alive API right above) that waits for generation to complete before sending it back via HTTP
  4. If you do need streaming, you could reference the backend/frontend code for how to implement the equivalent of what the react code does for the frontend in your backend. Alternatively, you could look into using HTTP response streaming.

thank you. just to be clear, i was not intending to generate images, it was a bad wording on my side to say "only text generation works".

1.1 ok
1.2 amazing. check this out:

image

2.1, 2.2, 2.3, 2.4 WOW. this is incredible. i don´t have the technical profficiency to understand with clarity how to implement what you say here (and how you say it), but im developing a bot for farcaster (decentralized social media network) and having it understand the images that people share with their casts will be a game changer. this is very very cool.

do you happen to have any code implementation that i can follow to understand how to wrap things up with this API endpoint? i dont need streaming, just sending it the image and the prompt and get back a json with the interpretation of the image having the context of the rest of the post along with it (as the screenshot that i just shared)

thanks so much for your help. this is v cool.

Thats great @jpfraneto!

I'll see what I can do on having a simplified API to access the model with. If I end up doing it, might be this week or mid next week. Thanks!

yiiiiiiaju, it works

image

this is a gist of the whole new version of service.py file, where i added the following function to the handler:

    @app.post("/api/interpret_post")
    async def interpret_post(
        image: Optional[UploadFile] = File(None),
        text: str = Form(...),
    ):
        try:
            parsed_prompt = [text]
            logger.info(f"Received text: {text}")
            
            if image:
                image_content = await image.read()
                image_file = io.BytesIO(image_content)
                parsed_prompt.append(PIL.Image.open(image_file))
                logger.info("Image received and processed")
            else:
                logger.info("No image received")

            logger.info(f"Parsed prompt: {parsed_prompt}")
            
            full_interpretation = ""
            logger.info("Starting generation...")
            
            # Use generate_multimodal_streaming but collect all output
            async for output_token in generator.generate_multimodal_streaming(parsed_prompt):
                if isinstance(output_token, str):
                    full_interpretation += output_token
                # We're still ignoring StreamingImage outputs for this use case

            logger.info(f"Generation complete. Full interpretation length: {len(full_interpretation)}")
            return {"interpretation": full_interpretation}

        except Exception as e:
            logger.error(f"Error in interpret_post: {str(e)}")
            logger.error(f"Error traceback: {traceback.format_exc()}")
            raise HTTPException(status_code=500, detail=str(e))

on the example of the screenshot, i'm calling this endpoint from a nodejs server using the following code:

async function testChameleonServer() {
    const imageUrl = 'https://imagedelivery.net/BXluQx4ige9GuW0Ia56BHw/b02c3501-4471-46c8-38ff-3de9b9b9ba00/original';
    const userText = "so i promised my first base edition at 10k followers, but i didn't expect to reach that milestone so quickly and while in Japan! good news is; i am back, i got the banger and i'm cooking it up right now. all information will be shared in /nicolas in due time 👨‍🍳 LET's FUCKING GO!📸🖼️";

    const systemPrompt = "You are an AI assistant that interprets social media posts containing both text and images. Analyze the following post and provide a comprehensive interpretation that considers both the text content and the visual elements of the image.";

    const combinedPrompt = `<<<SYSTEM PROMPT>>>${systemPrompt}<<</SYSTEM PROMPT>>>\n\n<<<USERS TEXT>>>${userText}<<</USERS TEXT>>>\n\nImage description: [An image is attached to this post. Please analyze its contents and how it relates to the text. Interpret the whole combination of image and post, and how it relates to what humans consider valuable sharing on the context of a decentralized social media platform]`;

    try {
        // Download the image
        const imageResponse = await axios.get(imageUrl, { responseType: 'arraybuffer' });
        const imageBuffer = Buffer.from(imageResponse.data, 'binary');

        // Create form data
        const form = new FormData();
        form.append('image', imageBuffer, { filename: 'image.jpg' });
        form.append('text', combinedPrompt);

        console.log('Sending request to Chameleon server');
        const response = await axios.post('http://localhost:7102/api/interpret_post', form, {
            headers: {
                ...form.getHeaders(),
            },
        });
        
        console.log('Response status:', response.status);
        console.log('Response headers:', response.headers);
        console.log('Full response data:', response.data);
        console.log('Chameleon interpretation:', response.data.interpretation);
    } catch (error) {
        console.error('Error:', error);
        if (axios.isAxiosError(error)) {
            console.error('Response data:', error.response?.data);
            console.error('Response status:', error.response?.status);
            console.error('Response headers:', error.response?.headers);
        }
    }
}

testChameleonServer()

this is, of course, the "hello world" version of this mechanism. bit it is up and running. and from here we evolve.

the output is pure gold.

thank you @EntilZha, you dont know how motivating was to get your reply here.

Having the same issue, cannot get Chameleon Viewer to even connect (http://localhost:7654/)
Direct following of the Repo instructions.

Screenshot from 2024-07-07 00-56-13

That's fantastic to hear @jpfraneto! For sure, can evolve from a MVP to wherever you need to get to! I'll close this for now, but feel free to open another issue if you have some followup questions

@mstatt, you haven't provided enough information to debug. We would be glad to help. Could you create another issue and give more details (OS, using docker or not, logs from everything, commands used, etc)?

Thank you @EntilZha More details:

  1. Running on EC2 Ubuntu.
  2. downloaded the git repo, installed all deps.
  3. downloaded the data with the download_data.sh script provided.
  4. opened 7654 port.
  5. I made no code changes or edits to any of the install scripts etc.
  6. docker-compose up --build
  7. No errors in terminal.
  8. Even tried the change from 0.0.0.0 to localhost, Same issue.

The Chameleon viewer renders in the browser just fine, but it never connects.(as per previous screenshot)

Could you create a separate issue with the same info @mstatt ?