S3 URL support for image-service and pipeline
Opened this issue · 3 comments
Time to time we get datasets with images that they are not on the web and need to be ingested into ALA.
The easy way can be: upload them into subfolder of the DR on a S3 bucket and put the paths in the DwCA files.
This may need a translation of s3://
to https://
or support of s3;//
in image-service.
The other workaround (for limited number of images) can be, uploading them into the image-service first and then linking them back in the DwCA.
Needs more discussion to find the best solution.
Hi @sat01a do we have any update on this? The process of loading actual image files are tedious now and involves Database update and image-reindex. It will be very good if we can give it some priority.
@sadeghim You should already be able to address s3 objects via a HTTP URL. If you have the s3 client library available you should be able to use s3Client.getUrl(bucket, path)
or the equivalent if it's a public object. For private objects you can supply presigned URLs, like so:
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(bucket, path)
.withMethod(HttpMethod.GET)
.withExpiration(expiration)
s3Client.generatePresignedUrl(generatePresignedUrlRequest)
@sbearcsiro We understand that we can't have public buckets/objects. I don't want to add temporary presigned URLs to objects in the data. It is messy and unmanageable. Is there another option?