hugheylab/pmparser

Order of AbstractText elements in abstract table

Closed this issue · 3 comments

Hi,

The PubMed XML files have no order attribute for the AbstractText element. Retrieving the order of the records for a pmid in the abstract table now has to use insertion order (e.g. with "ctid" in PostgreSQL).

Possible enhancements:

  • add an abstract_pos attribute in the abstract table
  • leave the abstract_pos empty if there is only 1 AbstractText element: a non-empty abstract_pos would signal a structured abstract (which was split by PubMed in its parts)

The first suggestion is of course the most important one.

Best regards,
geert

@JSchoenbachler Could you make a new branch and pull request adding an abstract_pos column to the output of parseAbstract()? You should be able to largely mimic the way parsePerson() does it, and then modify the tests and test data accordingly.

@jakejh yep I'll work on that!

@globbestael @jakejh Functionality has been added in 71c297c , closing.