emory-libraries/eulfedora

Documentation: clarify datastream.content usage for generic DatastreamObject vs FileDatastreamObject

Closed this issue · 5 comments

bcail commented

https://github.com/emory-libraries/eulfedora/blob/master/eulfedora/models.py#L181
Seems like it should say imgfile.read()... is that correct?

@bcail no, you should be able to assign a file-like object with a read method. (Possibly only relevant for for FileDatastream and not the base datastream class, but that's mentioned in the documentation you linked to). The save and ingest methods check for and handle reading from the file and posting it. This allows eulfedora to handle large files, so the contents don't get read into memory all at once.

That example may not be great for another reason, though, which is that if you use the with block then the file will get closed before you actually use it to send content to fedora.

If you see a way to make this clearer in the documentation, please make a suggestion or pull request.

bcail commented

@rlskoeser thanks for replying.

What I'm trying to do is create a fedora object and add a tiff datastream. I have some code working, and this is basically what it looks like:

obj = models.DigitalObject(md.REPO.api, create=True, default_pidspace='testsuite')
obj.save()
ds_obj = models.DatastreamObject(obj, ds_name, ds_label, mimetype='image/tiff', versionable=True)
with open(file_name) as f:
    ds_obj.content = f.read()
    ds_obj.save()

If I change f.read() to just f in the above code, it doesn't save the tiff datastream properly.

Is there a different/better way to structure this code, to fit more with how eulfedora is designed?

I see in the code comments the pattern of defining my own ImageObject class with an "image" FileDatastream, and then instantiating the class and setting obj.image.content = f, but I guess I'm looking to be able to create objects and datastreams without having to create a separate class to define the datastream structure first.

Wow, would not have occurred to me to try that. I'm almost surprised it works at all (although it's always nice when things work in other contexts than you anticipate). I think in this case the problem is that you're using the generic DatastreamObject rather than the FileDatastreamObject, which has the file object support - I suspect if you switch it you'll be able to pass the file object in directly and have it work.

I do think it might be better to use the DigitalObject getDatastreamObject method to initialize your datastream object (and specify a dsobj_type of FileDatastreamObject in this case) - although I guess I don't have a very strong argument for why you should use that instead of your own approach. The documentation notes that when you add new datastreams via getDatastreamObject you have to set all the datastream metadata, so perhaps your approach is more efficient for that.

FWIW, it might be useful to add some more examples along these lines, because sometimes you do need to work with objects in a more ad-hoc manner rather than with the pre-defined content models that are described in the documentation.

(I modified your post for formatting to make the code easier to read - indentation in particular; hope you don't mind.)

bcail commented

You're right - I switched it to FileDatastreamObject, and then just passing in the file object worked. Thanks!

I think the documentation might be a bit clearer if the example in DatastreamObject didn't reference the FileDatastream class. The FileDatastreamObject already has the example of reading from a file object, so I think in the DatastreamObject class I would remove that FileDatastream example (or just make it a new example that passes in content directly in a way that works in DatastreamObject itself).

Glad it worked. Your suggestion about the documentation sounds sensible - I'll leave this issue open as a reminder to revise that (but feel free to submit a pull request!).