/LayoutNUWA

Primary LanguagePythonMIT LicenseMIT

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models [ICLR2024]

Implementation of the paper LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models , which treats layout generation as a code generation task and effectively utilizes the inherent layout expertise of large language models to enhance the performance of layout generation significantly.

Model Performance

Generated Cases

We sample some generated cases by LayoutNUWA for the PubLayNet datasets.

Automatic Evaluation

Magazine Dataset

RICO and PubLayNet Dataset

Setup

We check the reproducibility under this environment.

  • Python 3.9.18
  • CUDA 11.6

Environment Installation

Prepare your environment with the following command

git clone https://github.com/ProjectNUWA/LayoutNUWA.git
cd LayoutNUWA

conda create -n layoutnuwa python=3.9
conda activate layoutnuwa

pip install -r requirements.txt

Model Preparation

We utilize LLaMA2-7B and CodeLLaMA-7B as our backbone. You can download the models and place them under the ./models directory.

Dataset Preparation

Rico and PubLayNet Dataset

Please follow https://raw.githubusercontent.com/CyberAgentAILab/layout-dm to download the preprocessed datasets, FID and clustering models.

Notice: make sure you are under the LayoutNUWA directory

wget https://github.com/CyberAgentAILab/layout-dm/releases/download/v1.0.0/layoutdm_starter.zip
unzip layoutdm_starter.zip

The data is decompressed to the following structure:

download
- clustering_weights
- datasets
- fid_weights
- pretrained_weights

Then, move the download files to the corresponding directory:

# preprocessed datasets
mv download/datasets/rico25-max25 data
mv download/datasets/publaynet-max25 data

# rico fid and clustering models
mkdir -p models/rico25-max25
mv download/fid_weights/FIDNetV3/rico25-max25 models/rico25-max25
mv download/clustering_weights/rico25_max25_kmeans_train_clusters.pkl models/rico25-max25

# publaynet fid and clustering models
mkdir -p models/publaynet-max25
mv download/fid_weights/FIDNetV3/models/publaynet-max25 models/publaynet-max25
mv download/clustering_weights/publaynet_max25_kmeans_train_clusters.pkl models/publaynet-max25

Magazine Dataset

  1. Download MagLayout.zip and decompress it.

  2. Create the new directory data/magazine/raw/ and move the contents into it as shown below:

    data/magazine/raw/
    └── layoutdata
        ├── annotations
           ├── fashion_0001.xml
           ├── fashion_0002.xml
           ├── fashion_0003.xml
           ├── fashion_0004.xml
           ├── fashion_0005.xml
           ├── ...
  3. Please follow https://github.com/ktrk115/const_layout/tree/master/data and https://github.com/CyberAgentAILab/layout-dm/blob/main/docs/custom_dataset.md to preprocess the raw datasets and train the FID as well as the clustering models.

Numerical Layout to Code Format Conversion

You can run the following command to generate the code data for the RICO dataset

NOTICE: if you want to generate code for two other datasets (publaynet and magazine), just modify the --dataset_name, --dataset_path, and --save_path.

Build Training Data

python convertHTML/build_code.py \
    --model_path_or_name /path/to/llamamodel \
    --dataset_name rico25 \
    --dataset_path data/rico25-max25 \
    --save_path data/rico25-max25/html_format \
    --bbox_quantization code \
    --consistency_num 10 \
    --add_task_instruction;

Build Testing Data

python convertHTML/build_code.py \
    --model_path_or_name /path/to/llamamodel \
    --dataset_name rico25 \
    --dataset_path data/rico25-max25 \
    --save_path data/rico25-max25/html_format \
    --bbox_quantization code \
    --add_task_instruction \
    --build_testing_set;

Model Training and Inference

We customize the training code based on the LLaMA-X

Training

Please Check trainer/src/configs/hostfile and trainer/src/configs/deepspeed_config_2.json first, where the current code is designed for 64 NVIDIA V100 GPUs (8 GPUs x 8 nodes).

cd trainer/src/scripts
bash scripts/train.sh

Inference

cd trainer/src/scripts
bash scripts/inference.sh

Evaluation

You can run the following command to evaluate the generated results for the RICO dataset (We have released generated results of RICO dataset in data/generated_results/rico as an example).

python evaluate.py \
    --file_dir data/generated_results/rico \
    --intermediate_saved_path data/generated_results/rico/all_gen.pt \
    --golden_file data/generated_results/rico/golden.jsonl \
    --fid_model_name_or_path models/rico25-max25 \
    --cluster_model models/rico25-max25/rico25_max25_kmeans_train_clusters.pkl \
    --dataset_name rico25 \
    --dataset_path data/rico25-max25 \

NOTICE: just change the dataset name, dataset path, and generated results path if you want to evaluate other datasets.

Acknowledgement

We appreciate the open source of the following projects:

Hugging FaceLLaMA-XLayoutDMConst Layout