stadt-karlsruhe/ckanext-extractor

ckan_worker interfering with other plugins

mystycs opened this issue · 3 comments

When ckan-worker runs to pull the metadata out using ckan-extractor, im getting this error below all the time.

Datasetthumbnail is a seperate plugin that has nothing to do with and is properly installed. If i disable the thumbnail plugin from prod.ini then it works and i dont get this error in ckan-worker.

Why is extracter ckan-worker doing this? Ive seen it do it before with other plugins too but never figured out the solution other than disabling the plugin it interferes with, but its always a random plugin that it says it interferes with. Everything is properly installed.

2018-10-14 13:57:43,667 INFO  [ckan.lib.jobs] Worker rq:worker:localhost.13874 has finished job fba8e776-22fe-4de9-99e1-77660fdada72 from queue "default"
2018-10-14 13:57:43,669 INFO  [rq.worker] 
2018-10-14 13:57:43,669 INFO  [rq.worker] *** Listening on ckan:default:default...
2018-10-14 13:57:48,763 INFO  [rq.worker] ckan:default:default: ckanext.extractor.tasks.extract('/etc/ckan/default/production.ini', {u'cache_last_updated': None, u'cache_url': None, u'mimetype_inner': None, u'hash': u'', u'description': u'', u'format': u'CSV', u'url': u'http://127.0.0.1/dataset/086935f5-0bd0-4171-83a6-91076e4fdfb1/resource/3f07b9d5-be73-40f1-a6b6-108f2069c332/download/5000-cc-records.csv', u'created': '2018-10-14T20:57:42.620174', u'state': u'active', u'package_id': u'086935f5-0bd0-4171-83a6-91076e4fdfb1', u'last_modified': '2018-10-14T20:57:42.592766', u'mimetype': u'text/csv', u'url_type': u'upload', u'position': 5, u'revision_id': u'2d9f79b7-ab4c-431e-878b-51bcd364d98b', u'size': 460482L, u'datastore_active': True, u'id': u'3f07b9d5-be73-40f1-a6b6-108f2069c332', u'resource_type': None, u'name': u'5000 CC Records.csv'}) (a0d3da6b-a7f6-4199-a6b4-76bdb81b12d8)
2018-10-14 13:57:48,763 INFO  [ckan.lib.jobs] Worker rq:worker:localhost.13874 starts job a0d3da6b-a7f6-4199-a6b4-76bdb81b12d8 from queue "default"
2018-10-14 13:57:49,128 ERROR [ckan.lib.jobs] Job a0d3da6b-a7f6-4199-a6b4-76bdb81b12d8 on worker rq:worker:localhost.13874 raised an exception: datasetthumbnail
Traceback (most recent call last):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/worker.py", line 588, in perform_job
    rv = job.perform()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/job.py", line 498, in perform
    self._result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/tasks.py", line 67, in extract
    load_config(ini_path)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/config.py", line 71, in load_config
    load_environment(conf.global_conf, conf.local_conf)
  File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 99, in load_environment
    p.load_all()
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 139, in load_all
    load(*plugins)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 153, in load
    service = _get_service(plugin)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 256, in _get_service
    raise PluginNotFoundException(plugin_name)
PluginNotFoundException: datasetthumbnail
2018-10-14 13:57:49,129 ERROR [rq.worker] PluginNotFoundException: datasetthumbnail
Traceback (most recent call last):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/worker.py", line 588, in perform_job
    rv = job.perform()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/job.py", line 498, in perform
    self._result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/tasks.py", line 67, in extract
    load_config(ini_path)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/config.py", line 71, in load_config
    load_environment(conf.global_conf, conf.local_conf)
  File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 99, in load_environment
    p.load_all()
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 139, in load_all
    load(*plugins)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 153, in load
    service = _get_service(plugin)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 256, in _get_service
    raise PluginNotFoundException(plugin_name)
PluginNotFoundException: datasetthumbnail
Traceback (most recent call last):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/worker.py", line 588, in perform_job
    rv = job.perform()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/rq/job.py", line 498, in perform
    self._result = self.func(*self.args, **self.kwargs)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/tasks.py", line 67, in extract
    load_config(ini_path)
  File "/usr/lib/ckan/default/src/ckanext-extracter/ckanext/extractor/config.py", line 71, in load_config
    load_environment(conf.global_conf, conf.local_conf)
  File "/usr/lib/ckan/default/src/ckan/ckan/config/environment.py", line 99, in load_environment
    p.load_all()
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 139, in load_all
    load(*plugins)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 153, in load
    service = _get_service(plugin)
  File "/usr/lib/ckan/default/src/ckan/ckan/plugins/core.py", line 256, in _get_service
    raise PluginNotFoundException(plugin_name)
PluginNotFoundException: datasetthumbnail
2018-10-14 13:57:49,129 WARNI [rq.worker] Moving job to u'failed' queue
2018-10-14 13:57:49,135 INFO  [ckan.lib.jobs] Worker rq:worker:localhost.13874 has finished job a0d3da6b-a7f6-4199-a6b4-76bdb81b12d8 from queue "default"
2018-10-14 13:57:49,136 INFO  [rq.worker] 
2018-10-14 13:57:49,137 INFO  [rq.worker] *** Listening on ckan:default:default...

Thanks for your report, @mystycs!

Here is what happens:

ckanext-extractor gets notified that there is a new or updated resource and schedules a background job to do the actual extraction. The worker then executes that background job in a new process. In order to do that, it first loads the CKAN environment inside the new process, so that the background job can use the usual CKAN features (access the database, call API functions, etc.). Part of loading the CKAN environment is loading all registered plugins. According to the stack trace you've provided this is where things go wrong for you.

Right now I'm not sure whether this is a problem of ckanext-extractor, the CKAN background job system, or your setup, so let's figure that out.

First of all, please make sure that ckanext-extractor is installed and configured correctly (see the documentation: running

wget -qO - http://localhost/api/3/action/extractor_list

Should return something like

{"help": "http://localhost/api/3/action/help_show?name=extractor_list", "success": true, "result": []}

If that works, please schedule a test background job. CKAN provides a paster command for that:

paster --plugin=ckan jobs test -c /etc/ckan/default/production.ini

Please post the logs of that background job, they should tell us whether this problem is related to ckanext-extractor or not.

So i actually ran sudo supervisorctl restart ckan-worker:* and the error went away. I think it just needed to be restarted. I will update here if it comes back.

Thanks for the update, @mystycs, and good that you're issue has been resolved. I'm closing this issue, but feel free to reopen it in case the problem reappears.

Restarting the worker can be forgotten easily, so I've filed an issue to improve CKAN's docs on restarting things: ckan/ckan#4512.