arXiv work missing
Closed this issue · 3 comments
Hi. I searched for "Sidewalk Measurements from Satellite Images: Preliminary Findings" a preprint found here https://arxiv.org/abs/2112.06120
Is there a bot importing regularly from arXiv?
We do have this work, and it shows up in search results for me: https://fatcat.wiki/release/search?q=Sidewalk+Measurements+from+Satellite+Images+Preliminary+Findings
https://fatcat.wiki/release/tphkgaozxbedxnwf35nf5yxey4
The issue may be if the Images:
is included in a fatcat.wiki search, that gets passed through to Elasticsearch/Lucene, which interprets it as a facet/filter, and returns no results: https://fatcat.wiki/release/search?q=Sidewalk+Measurements+from+Satellite+Images%3A+Preliminary+Findings&generic=1
Does that match what you experienced?
In scholar.archive.org, we have a kludge to try and notice this pattern and add quotes around such tokens, but the implementation isn't very good so I haven't copied it over. A "real" custom query parser is probably the solution, but is a larger project to bite off. Added a note about that specific issue to #29
Oh, and to answer the question, yes, a bot pulls new papers from arxiv every 24 hours using the OAI-PMH feed. New URLs are then enqueued for crawling, though arxiv.org often rate-limits our crawlers so it can take a while for them to get archived a through the entire indexing pipeline.
Exactly fits the issue. Thanks for the fast response 🤩