SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters

This repository is an implementation of the paper titled above.

https://arxiv.org/abs/2407.19787

Dataset

Download the dataset from https://huggingface.co/datasets/omron-sinicx/scipostlayout_v2. Then, place the dataset directory as ./scipostlayout.

Docker setup

We run all models on Python3.10 and CUDA12.1. Run the following commands to pull docker image and run docker container.

sh run_docker.sh

Layout Analysis

We consider layout analysis as an object detection problem and we use LayoutLMv3 and DiT as baselines.

Hyperparameter details (for both models):

Model size: Base
Backborn: Cascade R-CNN
Epoch: 100
Warm-up steps: 1000
Weight decay: 0.05
Batch size: 4
Learning rate: $lr \in {2e^{-5}, 5e^{-5}, 2e^{-4}}$

The checkpoint with the best performance on the dev set was used for evaluation.

1. LayoutLMv3

Setup

Run the following commands to install dependencies. (We downgrade torch to avoid errors after installing detectron2)

cd /scipostlayout/code/layoutlmv3/object_detection
apt update
apt upgrade -y
apt install -y gcc-10 g++-10
export CC=/usr/bin/gcc-10
export CXX=/usr/bin/g++-10
python3 -m venv layoutlm-venv
source layoutlm-venv/bin/activate
pip3 install -r requirements.txt
pip3 install 'git+https://github.com/facebookresearch/detectron2.git'
pip3 install torch==2.0.1 torchvision==0.15.2 "numpy<2"

Download the image dataset (see instructions above). The composition should be like the following:

$ ls scipostlayout/poster/png
train/
dev/
test/
train.json
dev.json
test.json

Download pre-trained checkpoint.

git clone https://huggingface.co/microsoft/layoutlmv3-base

Train & Inference

Run the following commands to train and inference LayoutLMv3. [script]

config.json would not be generated by training process but is neccasary for inference, plese cp from the original pre-trained checkpoint directory.

cp ./layoutlmv3-base/config.json ./lr_0.0002_max_iter_22500/

You need to specify the path of the pre-trained checkpoint and the dataset.

Please refer to cascade_layoutlmv3.yaml for hyperparameter details.

cd /scipostlayout/code/layoutlmv3/object_detection

MODEL_PATH=./layoutlmv3-base/pytorch_model.bin
OUT_PATH=.

LR=0.0002
MAX_ITER=22500

python3 train_net.py --config-file cascade_layoutlmv3.yaml --num-gpus 4 \
        MODEL.WEIGHTS $MODEL_PATH \
        PUBLAYNET_DATA_DIR_TRAIN PATH_TO/scipostlayout/poster/png/train \
        PUBLAYNET_DATA_DIR_TEST PATH_TO/scipostlayout/poster/png/dev \
        SOLVER.GRADIENT_ACCUMULATION_STEPS 1 \
        SOLVER.IMS_PER_BATCH 4 \
        SOLVER.BASE_LR $LR \
        SOLVER.WARMUP_ITERS 1000 \
        SOLVER.MAX_ITER $MAX_ITER \
        SOLVER.CHECKPOINT_PERIOD 2250 \
        TEST.EVAL_PERIOD 2250 \
        OUTPUT_DIR $OUT_PATH/lr_${LR}_max_iter_${MAX_ITER}

python3 train_net.py --config-file cascade_layoutlmv3.yaml --eval-only --num-gpus 4 \
        MODEL.WEIGHTS $OUT_PATH/lr_0.0002_max_iter_22500/model_final.pth \
        PUBLAYNET_DATA_DIR_TEST PATH_TO/scipostlayout/poster/png/test \
        OUTPUT_DIR $OUT_PATH/lr_0.0002_max_iter_22500

2. DiT

Setup

Install dependencies The virtualenv layoutlm-venv made in 1. LayoutLMv3 should be able to use for DiT too.

source /scipostlayout/code/layoutlmv3/object_detection/layoutlm-venv/bin/activate

Download the image dataset. (same as LayoutLMv3)
Download pre-trained checkpoint and rename it to dit-base-224-p16-500k.pth.

Train & Inference

You need to specify the path of the dataset in /scipostlayout/code/dit/object_detection/train_net.py.

register_coco_instances(
    "scipostlayout_train",
    {},
    "PATH_TO/scipostlayout/poster/png/train.json",
    "PATH_TO/scipostlayout/poster/png/train"
)
register_coco_instances(
    "scipostlayout_dev",
    {},
    "PATH_TO/scipostlayout/poster/png/dev.json",
    "PATH_TO/scipostlayout/poster/png/dev"
)
register_coco_instances(
    "scipostlayout_test",
    {},
    "PATH_TO/scipostlayout/poster/png/test.json",
    "PATH_TO/scipostlayout/poster/png/test"
)

Run the following commands to train and inference on DiT. You need to specify the path of the pre-trained checkpoint. [script]

Please refer to scipostlayout_configs for hyperparameter details.

cd /scipostlayout/code/dit/object_detection

MODEL_PATH=./checkpoints/dit-base-224-p16-500k.pth
OUT_PATH=.
LR=0.00002
MAX_ITER=22500

python3 train_net.py \
    --config-file scipostlayout_configs/cascade/cascade_dit_base.yaml \
    --num-gpus 4 \
    MODEL.WEIGHTS $MODEL_PATH \
    SOLVER.IMS_PER_BATCH 4 \
    SOLVER.BASE_LR $LR \
    SOLVER.WARMUP_ITERS 1000 \
    SOLVER.MAX_ITER $MAX_ITER \
    SOLVER.CHECKPOINT_PERIOD 2250 \
    TEST.EVAL_PERIOD 2250 \
    OUTPUT_DIR $OUT_PATH/lr_${LR}_max_iter_${MAX_ITER}

python3 train_net.py --config-file scipostlayout_configs/cascade/cascade_dit_base.yaml --eval-only --num-gpus 1 \
        MODEL.WEIGHTS lr_0.00002_max_iter_22500/model_0022499.pth \
        OUTPUT_DIR $OUT_PATH/results/lr_${LR}_max_iter_${MAX_ITER}

Layout Generation

1. LayoutDM

Setup

make a virtualenv and install poetry inside it.

cd /scipostlayout/code/layout-dm
python3 -m venv layoutdm-venv
source layoutdm-venv/bin/activate
curl -sSL https://install.python-poetry.org | python3 -
echo 'export PATH="/root/.local/bin:$PATH"' >> ./layoutdm-venv/bin/activate

Install dependencies.

poetry install

Please refer to the official README for more details.

FID training

To evaluate layout generation models, one has to train a FID model first.

Create a directory /scipostlayout/code/layout-dm/download/datasets/scipostlayout-max50/raw and copy all files under /scipostlayout/poster/png into the created directory.

cp -r /scipostlayout/scipostlayout/poster/png/* /scipostlayout/code/layout-dm/download/datasets/scipostlayout-max50/raw/

Rename dev.json to val.json under raw directory.

And then run the following command to train a FID model. The training process should take a few days using an A100 GPU. [script]

Hyperparameter details:

Training steps: 2e5
Batch size: 64
Learning rate: 3e-4

poetry run python3 src/trainer/trainer/fid/train.py \
    src/trainer/trainer/config/dataset/scipostlayout.yaml \
    --out_dir download/fid_weights/FIDNetV3/scipostlayout-max50

The checkpoints will be saved in /scipostlayout/code/layout-dm/download/fid_weights. model_best.pth.tar is used in all models' evaluation processes.

Training

Create a directory /scipostlayout/code/layout-dm/download/clustering_weights.

First conduct clustering before training. [script]

poetry run python3 bin/clustering_coordinates.py src/trainer/trainer/config/dataset/scipostlayout.yaml kmeans --result_dir download/clustering_weights

Run the following command to train LayoutDM. [script]

bash bin/train.sh scipostlayout layoutdm

Inference

Run the following command to inference. [script]

Update JOB_DIR to change the target results.

CONDS=(c cwh partial refinement relation)
JOB_DIR=/scipostlayout/code/layout-dm/tmp/jobs/scipostlayout/layoutdm_xxxxxxxx
RESULT_DIR=/scipostlayout/code/layout-dm/result_dir

for cond in ${CONDS[@]}; do
    poetry run python3 -m src.trainer.trainer.test \
        cond=$cond \
        job_dir=$JOB_DIR \
        result_dir=${RESULT_DIR}/${cond} \
        is_validation=true
done

is_validation=true: used to evaluate the generation performance on validation set instead of test set. This must be used when tuning the hyper-parameters.

Evaluation

We use the same evaluation code in LayoutDM and other models to ensure the consistency of results (Gen_T as an example). The visualization images will be saved under the result dir.

poetry run python3 eval.py /scipostlayout/code/layout-dm/result_dir/c/c_temperature_1.0_name_random_num_timesteps_100_validation

2. LayoutFormer++

Setup

Run the following commands to install dependencies.

cd /scipostlayout/code/LayoutFormer++
python3 -m venv layoutformer-venv
source layoutformer-venv/bin/activate
pip3 install -r requirements.txt

Copy the FID checkpoint trained in LayoutDM part to /scipostlayout/code/LayoutFormer++/src/net and rename to fid_scipostlayout.pth.tar.

If the following error occurs, plese apply fix from torch._six import inf to from torch import inf in the library.

Traceback (most recent call last):
  File "/scipostlayout/code/LayoutFormer++/src/main.py", line 5, in <module>
    from deepspeed.runtime.lr_schedules import WarmupLR
  File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/__init__.py", line 16, in <module>
    from .runtime.engine import DeepSpeedEngine, DeepSpeedOptimizerCallable, DeepSpeedSchedulerCallable
  File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 24, in <module>
    from deepspeed.runtime.utils import see_memory_usage, get_ma_status, DummyOptim
  File "/scipostlayout/code/LayoutFormer++/layoutformer-venv-tmp/lib/python3.10/site-packages/deepspeed/runtime/utils.py", line 18, in <module>
    from torch._six import inf
ModuleNotFoundError: No module named 'torch._six'

Please refer to the official README for more details.

Dataset

Create a directory /scipostlayout/code/LayoutFormer++/datasets/scipostlayout/raw/scipostlayout and copy the dataset under /scipostlayout/poster/png to the created directory to start training. Rename dev.json to val.json under raw/scipostlayout directory. When training starts, the dataset will be preprocessed automatically. We set max_num_elements to 50 and there are 9 categories in the dataset.

Training

Run the following commands to train the models.

cd /scipostlayout/code/LayoutFormer++/src

[script]

If you want to change the parameters, please refer to the scripts in /scipostlayout/code/LayoutFormer++/src/scripts.

The training process should take 1-5 hours using an A100 GPU.

./scripts/scipostlayout_gen_t.sh train ../datasets ../results/gen_t basic 1 none
./scripts/scipostlayout_gen_ts.sh train ../datasets ../results/gen_ts basic 1 none
./scripts/scipostlayout_gen_r.sh train ../datasets ../results/gen_r basic 1 none
./scripts/scipostlayout_completion.sh train ../datasets ../results/completion basic 1 none
./scripts/scipostlayout_refinement.sh train ../datasets ../results/refinement basic 1 none

Inference

Run the following commands to inference on the test set. [script]

By default, we train the models for 200 epochs and use the final checkpoint for evaluation.

Attention: programs output evaluation results in this step, but in order to take the same evaluation settings as LayoutPrompter, we conduct evaluation to prediction files independently.

The visualization images will be saved under the result dir, for example, /scipostlayout/code/LayoutFormer++/results/completion/completion/epoch_199/pics.

./scripts/scipostlayout_gen_t.sh test ../datasets ../results/gen_t basic 1 epoch_xxx
./scripts/scipostlayout_gen_ts.sh test ../datasets ../results/gen_ts basic 1 epoch_xxx
./scripts/scipostlayout_gen_r.sh test ../datasets ../results/gen_r basic 1 epoch_xxx
./scripts/scipostlayout_completion.sh test ../datasets ../results/completion basic 1 epoch_xxx
./scripts/scipostlayout_refinement.sh test ../datasets ../results/refinement basic 1 epoch_xxx

Evaluation

We save prediction and gold label files during inference (gen_t as an example).

/scipostlayout/code/LayoutFormer++/results/gen_t/gold_labels.pth
/scipostlayout/code/LayoutFormer++/results/gen_t/predictions.pth

Run /scipostlayout/code/LayoutPrompter/src/eval_layoutformer.py to conduct evaluation. You need to specify the FID model's path (which was trained in the LayoutDM part) and the prediction and gold label files' path in the program. You need to setup the environment for LayoutPrompter based on the below section.

cd /scipostlayout/code/LayoutPrompter
source layoutprompter-venv/bin/activate
cd src
python eval_layoutformer.py

Update the result_path in eval_layoutformer.py to change the target results.

result_path = "/scipostlayout/code/LayoutFormer++/results/gen_t"

3. LayoutPrompter

Setup

Run the following commands to install dependencies.

cd /scipostlayout/code/LayouPrompter
python3 -m venv layoutprompter-venv
source layoutprompter-venv/bin/activate
pip3 install -r requirements.txt

Please refer to the official README for more details.

Dataset

LayoutPrompter needs the dataset that LayoutFormer++ processed. Copy .pt files in /scipostlayout/code/LayoutFormer++/datasets/scipostlayout/pre_processed_50_9 to /scipostlayout/code/LayoutPrompter/datasets/scipostlayout-max50/raw to start to use LayoutPrompter.

Inference

To inference on LayoutPrompter, you need to prepare OpenAI API key. We use gpt-4-1106-preview instead of text-davinci-003 for greater context length.

Run the following commands to inference on LayoutPrompter (gent as an example). You need to specify the OPENAI_API_KEY, the OPENAI_ORGANIZATION, and the FID model's path (which was trained in the LayoutDM part). [script]

Evaluation will be automatically conducted after inference. We calculate metrics on the top-1 layouts of the layout ranker. Please refer to issue LayoutPrompter evaluation code? for details. The visualization images will be saved under the result dir.

cd /scipostlayout/code/LayoutPrompter
python3 src/constraint_explicit.py \
    --task gent \
    --base_dir . \
    --fid_model_path $FID_MODEL_PATH

Paper-to-Layout

Nougat PDF parser

We use Nougat to parser papers' PDFs.

cd /scipostlayout/code/Paper-to-Layout
pip3 install nougat-ocr
nougat ../../dataset/paper/dev -o mmd/dev -m 0.1.0-base --recompute --no-skipping --batchsize 8
nougat ../../dataset/paper/test -o mmd/test -m 0.1.0-base --recompute --no-skipping --batchsize 8

The parsed mmd files are included in scipostlayout/paper/mmd.

Copy scipostlayout/paper/mmd to /scipostlayout/code/Paper-to-Layout/.

GPT inference

We use GPT4 to extract constraints for layout generation from papers. Run the following commands to start inference. We provide three different prompts (base/rule/rule_react).

pip3 install --upgrade pip
pip3 install openai tqdm

export OPENAI_API_KEY='YOUR_API_KEY'
export OPENAI_ORGANIZATION='YOUR_ORGANIZATION'

python3 extract_constraints_gent.py \
    --data_path ../../scipostlayout/poster/png/test.json \
    --mmd_path mmd/test \
    --prompt_path prompt/prompt_base.txt \
    --model gpt-4-1106-preview

python3 extract_constraints_gent.py \
    --data_path ../../scipostlayout/poster/png/test.json \
    --mmd_path mmd/test \
    --prompt_path prompt/prompt_rule.txt \
    --model gpt-4-1106-preview

python3 extract_constraints_gent.py \
    --data_path ../../scipostlayout/poster/png/test.json \
    --mmd_path mmd/test \
    --prompt_path prompt/prompt_rule_react.txt \
    --model gpt-4-1106-preview

Layout generation using extracted constraints

Generate layouts based on extracted constraints. Run the scripts inside each model directory.

LayoutDM

[script]

cond=c
JOB_DIR=/scipostlayout/code/layout-dm/tmp/jobs/scipostlayout/layoutdm_xxxxxxxx
RESULT_DIR=/scipostlayout/code/layout-dm/result_dir

poetry run python3 -m src.trainer.trainer.test \
    cond=$cond \
    job_dir=$JOB_DIR \
    result_dir=${RESULT_DIR}/${cond} \
    gen_const_path="/scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json"
    # is_validation=true

Run evaluation.

source layoutdm-venv/bin/activate
poetry run python3 eval.py /scipostlayout/code/layout-dm/result_dir/c_const_rule/c_temperature_1.0_name_random_num_timesteps_100_test

LayoutFormer++

[script]

./scripts/scipostlayout_gen_t.sh test ../datasets ../results/gen_t basic 1 epoch_199 /scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json

Run /scipostlayout/code/LayoutPrompter/src/eval_layoutformer.py to conduct evaluation.

cd /scipostlayout/code/LayoutPrompter
source layoutprompter-venv/bin/activate
cd src
python eval_layoutformer.py

LayoutPrompter

[script]

python3 src/constraint_explicit.py \
    --task gent \
    --base_dir /scipostlayout/code/LayoutPrompter \
    --fid_model_path /scipostlayout/code/layout-dm/download/fid_weights/FIDNetV3/scipostlayout-max50/model_best.pth.tar \
    --gen_const_path /scipostlayout/code/Paper-to-Layout/results/test/prompt_rule.json \
    --use_saved_response

Layout generation from summarized papers

Generate layouts from summarized papers. Run the scripts inside each model directory.

LayoutPrompter

[script]

python3 src/constraint_explicit.py \
    --task genp \
    --base_dir /scipostlayout/code/LayoutPrompter \
    --fid_model_path /scipostlayout/code/layout-dm/download/fid_weights/FIDNetV3/scipostlayout-max50/model_best.pth.tar \
    --mmd_dir /scipostlayout/code/Paper-to-Layout/mmd \
    --use_saved_response

Citation

If you find this code useful for your research, please cite our paper and the above repositories.:

@misc{tanaka2024scipostlayoutdatasetlayoutanalysis,
      title={SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters}, 
      author={Shohei Tanaka and Hao Wang and Yoshitaka Ushiku},
      year={2024},
      eprint={2407.19787},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.19787},
}