Issues
- 1
Source value for desy-ingested records
#134 opened by david-caro - 0
Spider: Refresh the IOP spider
#205 opened by david-caro - 0
arXiv spider: collaborations
#251 opened by ksachs - 1
Q: arXiv spider: collaborations
#248 opened by ksachs - 0
populate raw_affiliations instead of affiliations
#185 opened by michamos - 0
- 0
- 0
- 4
Add `keywords` field
#121 opened by spirosdelviniotis - 0
OSTI spider
#157 opened by michamos - 1
Support multiple publication_info entries
#125 opened by david-caro - 4
Datacite spider
#192 opened by michamos - 0
Kill get_journal_and_section
#191 opened by michamos - 1
- 3
Directly harvest oai sources
#197 opened by david-caro - 0
Investigate how scrapy uses the `*args` and `**kwargs` in the `__init__` of the spider and see if we can add
#218 opened by david-caro - 0
Support multiple sets for OAI spiders
#201 opened by david-caro - 0
OAI: Improve last_run generation/loading
#202 opened by david-caro - 0
Use more metadata from APS
#184 opened by michamos - 0
harvest CDS through dojson directly
#199 opened by michamos - 2
loader: digest "all" possible date formats
#169 opened by fschwenn - 0
Use the JATS parser for Hindawi
#190 opened by michamos - 0
Create/update PoS spider
#159 opened by david-caro - 7
post-enhancement: complete CC-license information
#172 opened by fschwenn - 4
Handling "on behalf of the ATLAS Collaboration"
#176 opened by kaplun - 1
loader: calculate "number_of_pages"
#170 opened by fschwenn - 0
CDS Spider
#178 opened by kaplun - 3
Duplicated code that generates 'acquisition_source'
#175 opened by iulianav - 0
Add mechanism for crawling only once
#161 opened by spirosdelviniotis - 5
post-enhancement: automatically set 'citeable=True'
#173 opened by fschwenn - 0
- 0
loader: get arXiv number from ADS?
#168 opened by fschwenn - 1
desy spider
#133 opened by david-caro - 1
Using refextract for unstructured references
#156 opened by fschwenn - 8
Use material whenever possible
#128 opened by michamos - 1
Add missing crawler2hep unit tests
#138 opened by david-caro - 1
- 0
global: bump `hepcrawl` to version `~36.0`
#147 opened by spirosdelviniotis - 0
Upgrade to schemas 40
#153 opened by david-caro - 0
docs: use Sphinx bundled napoleon
#152 opened by david-caro - 0
- 1
Add functional tests to arxiv spider
#127 opened by spirosdelviniotis - 0
source should always be spider name, not hepcrawl
#130 opened by michamos - 0
Harvesting Books from Amazon
#137 opened by kaplun - 0
Tests: move reusable code into testlib
#131 opened by spirosdelviniotis - 1
docker-compose: remove doc service
#123 opened by david-caro - 0
Introduce functional tests for WSP spider
#118 opened by spirosdelviniotis - 1
- 2
Use the full pipeline output on the wsp tests
#111 opened by david-caro - 3
unit tests: create environment handler fixture
#113 opened by david-caro