allenai/s2orc

Is this data set seachable?

russelljjarvis opened this issue · 2 comments

Hi @russelljjarvis, the dataset is distributed as static JSONLines files. We don't provide any search interface on top of it. I suppose it's searchable to the extent that I've used it for:

  • Finding papers with a certain metadata field (e.g. papers from ACL or papers that are Computer Science). This is just a simple Python loop through each row and checking its metadata field.
  • Finding papers that match a certain regex . This is either using grep in bash or with Python; loop through each row checking the title, abstract, body text for a match.