Can you tell me how the dataset should be organized in this code

Question

Can you tell me how the dataset should be organized in this code

Closed this issue 6 months ago · 2 comments

Can you tell me how the dataset should be organized in this code? For example, I randomly selected 5,000 images from diffusiondb, how should I organize my reference images and prompt strings?Like that?
--datasets
--main
--diffusiondb
--image1
--image2
--attacked

Answer 1 · 2024-04-24T15:33:51.000Z

Hi, thank you for your interest and for reaching out. Apologies for the delayed response.

For non-attacked images, which include both the original and watermarked images, we suggest the following directory structure. The dataset names diffusiondb and mscoco are used here as examples. The directory real contains non-watermarked, original images, and <watermark_method> should be replaced with your specific watermarking technique.

main
├── diffusiondb
│   ├── prompts.json
│   ├── real
│   ├── <watermark_method>
└── mscoco
    ├── prompts.json
    ├── real
    └── <watermark_method>

For attacked images, please use the same structure:

attacked
├── diffusiondb
│   ├── <attack_method>-<attack_strength>-<watermark_method>
└── mscoco
    └── <attack_method>-<attack_strength>-<watermark_method>

Please configure your .env file to specify the parent directory of both main and attacked folders as follows:

DATA_DIR=/path/to/datasets

This organization will facilitate correct file management and accessibility within your project.

Answer 2 · 2024-04-28T01:28:49.000Z

Thank you for your response! I really appreciate your work!