IllDepence/unarXive

The error in paper structure

Ma-Yongqiang opened this issue · 2 comments

This dataset is very helpful for NLP research in the scientific domain.

When I checked the parsed paper structure, I found some errors in the aspect of the paper structure.
For the paper "2212.00253" in this dataset, the subsection "Deep Reinforcement Learning" is actually in section 2.
However, the parsed result shows that the subsection "Deep Reinforcement Learning" is in section 1.

image

the section information in pdf file:
image

The reason might be that the section 2 head text "BACKGROUND" does not have the sub-paragraph, which is lost in the tex file process.

Thank you for the input.

I took a look at the LaTeX source of the paper and saw that section 1 is created using a template specific setup:
\IEEEraisesectionheading{\section{Introduction}}

I could imagine that this trips up the LaTeX parsing.

Are the paragraphs that follow continuously numbered (1, 2, 3, ...) or does is stay 1 throughout the paper?

the following paragraphs are continuously numbered (1, 2, 3, ...).

2212.00253.json.txt