PublicaMundi/ckanext-publicamundi

Raster_Identify Failed to download: invalid url?

Opened this issue · 6 comments

howff commented

The URL reported in the error message seems to be missing the host and path parts when I add a raster resource to a dataset:

Downloading resource from http://nppdnbdaysdr.17319131254.tif to: /var/local/ckan/default/tmp//rasterstorer//Cov_.raster

Is that the reason why celery sticks?
Maybe there is some magic configuration file to edit?

Full message:

[2017-11-20 15:03:33,593: INFO/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_Identify]Downloading resource a32a6046-ce1f-42a4-a417-118376bb32e3...
[2017-11-20 15:03:33,593: INFO/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_DownloadResource] Downloading resource a32a6046-ce1f-42a4-a417-118376bb32e3 from http://nppdnbdaysdr.17319131254.tif to: /var/local/ckan/default/tmp//rasterstorer/a32a6046-ce1f-42a4-a417-118376bb32e3/Cov_a32a6046_ce1f_42a4_a417_118376bb32e3.raster
[2017-11-20 15:03:33,596: ERROR/PoolWorker-1] rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12]: [Raster_Identify] Failed to download: Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known>
[2017-11-20 15:03:33,610: ERROR/MainProcess] Task rasterstorer.identify[24bc5372-7c28-4524-93e8-b222e62fbf12] raised exception: CannotDownload('Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known>',)
Traceback (most recent call last):
File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/execute/trace.py", line 47, in trace
return cls(states.SUCCESS, retval=fun(*args, **kwargs))
File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/task/init.py", line 247, in call
return self.run(*args, **kwargs)
File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/init.py", line 175, in run
return fun(*args, **kwargs)
File "/var/local/ckan/default/pyenv/src/ckanext-publicamundi/ckanext/publicamundi/storers/raster/tasks.py", line 28, in rasterstorer_identify
rasterstorer_identify.retry(exc=ex, countdown=60)
File "/var/local/ckan/default/pyenv/local/lib/python2.7/site-packages/celery/app/task/init.py", line 535, in retry
self.name, options["task_id"], args, kwargs))
CannotDownload: Failed to download http://nppdnbdaysdr.17319131254.tif: <urlopen error [Errno -2] Name or service not known>

The URL does not seem to be a valid one.

howff commented

That's right, but it has been generated by the raster importer so either there is a bug in the raster importer or something needs to be configured somewhere. Any ideas?

(I created a dataset in ckan web interface and attached a geotiff resource)

@drmalex07 any ideas?

Well, i think you should ping the rasdaman team, which was the only one involved with raster-storer plugin.

cross posting from the rasdaman-dev mailing list in case I'm missing something:

Hi Andrew,

It's been a while since I looked over this code.

The error occurs in https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/tasks.py#L11. The method exposes a celery task which prepares a resource for ingestion in rasdaman. In your case the URL points to a non-existing resource, so the rasterstorer can not import it.

You could track where the URL is coming from. It appears in the task context, and is passed on to a utility class for download. On https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/tasks.py#L18 you already have dump of the context, so a first step would be to check if the url is ok in the context or it's already broken when it gets there. If the URL is correct in the context but it still fails to download, you can have a look at https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/raster_plugin_util.py#L49 (but my intuition is that the URL is already pointing to nothing in the context).

The celery task itself is created whenever a new resource with one of the geotiff, png, jpeg, zip or raster formats is added to ckan (https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/storers/raster/plugin.py#L60).

HTH,
Vlad

howff commented