LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models [ICLR2024]
Implementation of the paper LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models , which treats layout generation as a code generation task and effectively utilizes the inherent layout expertise of large language models to enhance the performance of layout generation significantly.
We sample some generated cases by LayoutNUWA for the PubLayNet datasets.
We check the reproducibility under this environment.
- Python 3.9.18
- CUDA 11.6
Prepare your environment with the following command
git clone https://github.com/ProjectNUWA/LayoutNUWA.git
cd LayoutNUWA
conda create -n layoutnuwa python=3.9
conda activate layoutnuwa
pip install -r requirements.txt
We utilize LLaMA2-7B and CodeLLaMA-7B as our backbone.
You can download the models and place them under the ./models
directory.
Please follow https://raw.githubusercontent.com/CyberAgentAILab/layout-dm to download the preprocessed datasets, FID and clustering models.
Notice
: make sure you are under the LayoutNUWA directory
wget https://github.com/CyberAgentAILab/layout-dm/releases/download/v1.0.0/layoutdm_starter.zip
unzip layoutdm_starter.zip
The data is decompressed to the following structure:
download
- clustering_weights
- datasets
- fid_weights
- pretrained_weights
Then, move the download files to the corresponding directory:
# preprocessed datasets
mv download/datasets/rico25-max25 data
mv download/datasets/publaynet-max25 data
# rico fid and clustering models
mkdir -p models/rico25-max25
mv download/fid_weights/FIDNetV3/rico25-max25 models/rico25-max25
mv download/clustering_weights/rico25_max25_kmeans_train_clusters.pkl models/rico25-max25
# publaynet fid and clustering models
mkdir -p models/publaynet-max25
mv download/fid_weights/FIDNetV3/models/publaynet-max25 models/publaynet-max25
mv download/clustering_weights/publaynet_max25_kmeans_train_clusters.pkl models/publaynet-max25
Magazine Dataset
-
Download
MagLayout.zip
and decompress it. -
Create the new directory
data/magazine/raw/
and move the contents into it as shown below:data/magazine/raw/ └── layoutdata ├── annotations │ ├── fashion_0001.xml │ ├── fashion_0002.xml │ ├── fashion_0003.xml │ ├── fashion_0004.xml │ ├── fashion_0005.xml │ ├── ...
-
Please follow https://github.com/ktrk115/const_layout/tree/master/data and https://github.com/CyberAgentAILab/layout-dm/blob/main/docs/custom_dataset.md to preprocess the raw datasets and train the FID as well as the clustering models.
You can run the following command to generate the code data for the RICO dataset
NOTICE
: if you want to generate code for two other datasets (publaynet and magazine), just modify the --dataset_name
, --dataset_path
, and --save_path
.
python convertHTML/build_code.py \
--model_path_or_name /path/to/llamamodel \
--dataset_name rico25 \
--dataset_path data/rico25-max25 \
--save_path data/rico25-max25/html_format \
--bbox_quantization code \
--consistency_num 10 \
--add_task_instruction;
python convertHTML/build_code.py \
--model_path_or_name /path/to/llamamodel \
--dataset_name rico25 \
--dataset_path data/rico25-max25 \
--save_path data/rico25-max25/html_format \
--bbox_quantization code \
--add_task_instruction \
--build_testing_set;
We customize the training code based on the LLaMA-X
Please Check trainer/src/configs/hostfile
and trainer/src/configs/deepspeed_config_2.json
first, where the current code is designed for 64 NVIDIA V100 GPUs (8 GPUs x 8 nodes).
cd trainer/src/scripts
bash scripts/train.sh
cd trainer/src/scripts
bash scripts/inference.sh
You can run the following command to evaluate the generated results for the RICO dataset (We have released generated results of RICO dataset in data/generated_results/rico
as an example).
python evaluate.py \
--file_dir data/generated_results/rico \
--intermediate_saved_path data/generated_results/rico/all_gen.pt \
--golden_file data/generated_results/rico/golden.jsonl \
--fid_model_name_or_path models/rico25-max25 \
--cluster_model models/rico25-max25/rico25_max25_kmeans_train_clusters.pkl \
--dataset_name rico25 \
--dataset_path data/rico25-max25 \
NOTICE
: just change the dataset name, dataset path, and generated results path if you want to evaluate other datasets.
We appreciate the open source of the following projects: