allenai/s2orc

[Release] TODOs

kyleclo opened this issue · 0 comments

pipeline refactor

  • Switch from CorpusDB to SDS integration (@kyleclo )
  • Refactor pipeline code to import from Lucy's latest PDF2Parser library (includes upgrade to Grobid 0.6.1) (@lucylw )

parser

  • Double-check Backmatter
  • Include section numbers
  • HTML Table parses

release

  • [ ]

notes

  • kylel/2020-07-01/new_release is dead, but check for any notes later for bugfixes (e.g. cite_spans that dont exist)