After the extraction the index is not updated

Question

After the extraction the index is not updated

Closed this issue 7 years ago · 4 comments

I'm not sure if I understand the handling correctly. As far as I can tell, thanks to celeryd all new uploads (e.g. of a PDF file) are automatically extracted. But then the result is not yet present in the search index. So to actually make use of the extracted fulltext for the search, I have to rebuild the index.

Is this correct? Or should the index eventually be updated?

Answer 1 · 2017-03-29T14:59:55.000Z

I just saw that apparently the indexing fails with the following error:

Traceback (most recent call last):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/lib/ckan/ckanext/ckanext-extractor/ckanext/extractor/tasks.py", line 82, in extract
    index_for('package').update_dict(pkg_dict)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 103, in update_dict
    self.index_package(pkg_dict, defer_commit)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 215, in index_package
    rel_dict[type].append(model.Package.get(rel['subject_package_id']).name)
KeyError: 'subject_package_id'

But paster --plugin=ckan search-index rebuild works. Any idea what could cause this behavior?

Answer 2 · 2017-05-05T07:15:16.000Z

I've never seen that error and currently have no idea regarding its cause. Does this happen reproducibly? Does it affect only certain resources/datasets or all of them? As far as I understand, subject_package_id is used in package relations which we don't really use.

Answer 3 · 2017-05-05T07:32:58.000Z

@torfsen after more investigation, I think this is not really a problem of this extension, but rather caused by ckan/ckan#2332. I have code that creates a relationship in the after_create hook, and this error prevented the indexing from running sucessfully.

Answer 4 · 2017-05-05T08:02:37.000Z

@metaodi OK, thanks for digging into it!