replicate/replicate-python

`input_images must be a zip file` error when running a training

Closed this issue · 18 comments

I am trying to create a Flux lora using https://replicate.com/ostris/flux-dev-lora-trainer/train.
However it is not accepting the zip file I provide, stating that it's not a zip file. To make sure it's a zip file (even though I know it) I also checked this on my end by using zipfile library.

Model created: tobias-varden/test-fifth-lora
<_io.BufferedReader name='C:\temp\dreambooth\dreambooth.zip'>
Training started: starting
Training URL: https://replicate.com/p/xxxxxxxxxxxxxxxxx
Training status: failed
Training failed or was canceled. Status: failed
Training logs: Traceback (most recent call last):
File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/worker.py", line 354, in _predict
result = predict(**payload)
File "/src/train.py", line 127, in train
extract_zip(input_images, INPUT_DIR)
File "/src/train.py", line 287, in extract_zip
raise ValueError("input_images must be a zip file")
ValueError: input_images must be a zip file

Currently using replicate==0.31.0 version.
I am on Windows 11.
The code:

    with open(image_path, "rb") as f:
        print(f)
        training = replicate.trainings.create(
            version="ostris/flux-dev-lora-trainer:4ffd32160efd92e956d39c5338a9b8fbafca58e03f791f6d8011f3e20e8ea6fa",
            input={
                "input_images": f,
                "steps": 1000,
                "prefix": f"A photo of {token}, "
            },
            destination=f"{model.owner}/{model.name}"
        )

The image path is with double slashes \\ but it's not showing up for some reason.

Same issue here!

mattt commented

@tobias-varden @nikitalokhmachev-ai I just released a new version of the Python client yesterday that adds support for the new files API. Can you try upgrading to 0.32.0 and trying again?

@mattt I've tried this new version, I can see the change in the UI that the .zip archive is now being uploaded but I am still getting the same error

I'm getting the same error using the latest version as well.

@tobias-varden @nikitalokhmachev-ai I just released a new version of the Python client yesterday that adds support for the new files API. Can you try upgrading to 0.32.0 and trying again?

Hi Matt, thanks for the update! I updated to 0.32.0 package, but I still get the same issue.

I see the new code, do I need to upload the file before starting the training and refer to the uploaded file somehow in the input_images ?

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

https://developers.cloudflare.com/r2/

@tobias-varden I got this to work by just uploading my zip to R2 and referencing the public URL in input_images. I'm assuming file uploads from the OS just don't work for some reason

Could you explain what does R2 stands for? Thanks!

https://developers.cloudflare.com/r2/

Yep, Cloudflare R2. S3 should work too. Or hosting it on your own somewhere.
In other words, passing in a URL works over passing files directly

mattt commented

Apologies for the inconvenience, folks. I can confirm that the issues are a result of incorrect validation logic in the model that looks for a .zip file extension. I've opened an upstream PR. In the meantime, please try passing a file handle (open("path/to/file.zip") with a .zip extension, or if that fails, use the suggested workaround to upload to S3 or R2.

fa9r commented

@mattt FYI I get a similar error with other replicate models since the last release when running a model with local .jpg inputs by using local files as inputs: replicate.exceptions.ModelError: Please provide png, jpg or jpeg images.

Edit: Pinning replicate to an older version fixed the issue for me.

mattt commented

Hey everyone. Thanks again for your feedback and patience. I found a problem in how file uploads work when passing a file handle that caused filenames to not be passed correctly. This was fixed by #343, and is now available in 0.32.1. Updating to that version should sort out the problems y'all are seeing. (If not, please let me know!)

Thanks @mattt this works for me now!

dw820 commented

Hi @mattt I am still getting this error and my replicate version is 0.32.1

Here's the code i used to zip the folder of images

def zip_folder(folder_path, output_zip):
    with zipfile.ZipFile(output_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in os.walk(folder_path):
            for file in files:
                file_path = os.path.join(root, file)
                zipf.write(file_path, os.path.relpath(file_path, folder_path))

folder_to_zip = './data'
output_zip_file = 'data.zip'

zip_folder(folder_to_zip, output_zip_file)

and call the replicate.trainings.create

training = replicate.trainings.create(
    version="stability-ai/sdxl:xxx",
    input={
        "input_images": open("data.zip","rb"),
        "token_string": "TOK",
        "caption_prefix": "a photo of TOK, "
    },
    destination=f"{model.owner}/{model.name}"
)

After that I got an input_images link from replicate, I can download the exact zip file I have locally, but still get this error.
Screenshot 2024-09-10 at 7 52 45 PM

Here's the full error log

Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/cog/server/worker.py", line 354, in _predict
result = predict(**payload)
File "train.py", line 142, in train
input_dir = preprocess(
File "/src/preprocess.py", line 118, in preprocess
assert False, "input_images_filetype must be zip or tar"
AssertionError: input_images_filetype must be zip or tar
mattt commented

Hi @dw820. That looks like a model-specific problem. The original issue had to do with https://replicate.com/ostris/flux-dev-lora-trainer/train (which is quite a big step up from SDXL, so it'd be worth your while to check it out!)

dw820 commented

Got it, thanks for the context!