inspirehep/inspire-dojson

add_inspire_categories is completely broken

Closed this issue · 2 comments

reviewing the uses of classify_field, I discovered this:

def add_inspire_categories(record, blob):
if not record.get('arxiv_eprints') or record.get('inspire_categories'):
return record
for arxiv_category in force_list(get_value(record, 'arxiv_eprints.categories')):
inspire_category = classify_field(arxiv_category)
if inspire_category:
record['inspire_category'] = [
{
'source': 'arxiv',
'term': inspire_category,
},
]
return record

It is completely broken as:

  1. inspire_category does not exist in the schema (inspire_categories is the correct form);
  2. it overwrites the output on every iteration.

I don't know if it is ever run, as it requires arxiv_eprints to be present but inspire_categories not to be, which should be pretty rare, and I have never seen any migration error because of this field.

It would be good to review the whole logic around categories in dojson as it's quite intricate.

Interestingly enough, that line is never called in the entire test suite: https://coveralls.io/builds/13627440/source?filename=inspire_dojson%2Fhep%2Fmodel.py#L58.

Closed in #115.