IllDepence/unarXive
A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
PythonMIT
Issues
- 1
- 1
PDF version not specified
#23 opened by yuezh000 - 2
- 0
Handling of footnotes
#21 opened by IllDepence - 3
Questions about the authors in this dataset
#20 opened by Zivenzhu - 2
The error in paper structure
#19 opened by Ma-Yongqiang - 2
- 1
Full dataset approximate size
#17 opened by nicklausbrown - 2
Accessing actual figure image files
#16 opened by IIZCODEII - 3
How can I get OpenAlex dump files?
#14 opened by v-miazhang - 5
About citation matching
#15 opened by Zivenzhu - 0
DOI based matching should be done directly against OpenAlex DOIs (not using title)
#12 opened by IllDepence - 0
For some papers, references are only matched up to part of the bib_entries list
#11 opened by IllDepence - 1
- 2
Is the data open source?
#6 opened by fishiu - 1
- 1
Dataset sample
#1 opened by malteos