Category | Training Set | Validating Set | Testing Set |
---|---|---|---|
Num of Images | 20365 | 500 | 499 |
Percentage | 95% | 2.5% | 2.5% |
Training Set:
category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|
chapter | 11312 | section | 17471 | clause | 106931 | total | 135714 |
Validating Set:
category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|
chapter | 151 | section | 246 | clause | 3096 | total | 3493 |
Testing Set:
category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|
chapter | 151 | section | 249 | clause | 2947 | total | 3347 |
Images
Annotation
-
Beihang Pan:
-
Google Drive:
- Best Model finetuned with Company Articles Dataseton based on pretrained model of Faster-RCNN-ResNet
- faster_rcnn_resnet101_coco_2018_01_28: backbone的预训练模型,用于publaynet数据集训练
- visualizeSet.py: 可视化数据集
- build.py: 构建优化器和学习率策略
- utils.py: 使用publaynet数据集的工具文件
- train.py: 使用publaynet数据集的训练文件
- test_per_img.py: 可视化测试集的预测结果
- predict.py: 使用publaynet数据集的预测文件
- Linux or macOS with Python ≥ 3.6
- cython
- opencv-python
- torchvision (PyTorch ≥ 1.3)
- 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!pip install pyyaml==5.1
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!git clone https://github.com/noba1anc3/Publaynet.git
cd Publaynet
After having the above dependencies and gcc & g++ ≥ 5, run:
!git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
!python -m pip install -e .
cd ..
# Or if you are on macOS
# CC=clang CXX=clang++ python -m pip install -e .
from google.colab import drive
drive.mount('/content/drive/')
mkdir data
cp -rf ../drive/'My Drive'/train.zip ./data/
cp -rf ../drive/'My Drive'/val.zip ./data/
cd data
!unzip train.zip
!unzip val.zip
cd ..
!python train.py -f False
mkdir output
cp -rf ../drive/'My Drive'/model_final.pth ./output/
!python train.py -f True
chapter AP | section AP | clause AP | mAP |
---|---|---|---|
85.180 | 86.641 | 93.367 | 88.396 |
AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|
88.396 | 99.037 | 98.956 | NaN | 80.382 | 88.964 |
AR1 | AR10 | AR100 | ARs | ARm | ARl |
---|---|---|---|---|---|
57.0 | 91.4 | 92.0 | NaN | 84.8 | 92.1 |