Order of AbstractText elements in abstract table
Closed this issue · 3 comments
globbestael commented
Hi,
The PubMed XML files have no order attribute for the AbstractText element. Retrieving the order of the records for a pmid in the abstract table now has to use insertion order (e.g. with "ctid" in PostgreSQL).
Possible enhancements:
- add an abstract_pos attribute in the abstract table
- leave the abstract_pos empty if there is only 1 AbstractText element: a non-empty abstract_pos would signal a structured abstract (which was split by PubMed in its parts)
The first suggestion is of course the most important one.
Best regards,
geert
jakejh commented
@JSchoenbachler Could you make a new branch and pull request adding an abstract_pos
column to the output of parseAbstract()
? You should be able to largely mimic the way parsePerson()
does it, and then modify the tests and test data accordingly.
JSchoenbachler commented
@jakejh yep I'll work on that!
JSchoenbachler commented
@globbestael @jakejh Functionality has been added in 71c297c , closing.