Correct mime types for json subjects
srallen opened this issue · 6 comments
The TESS project should be using json files as one of their subject locations, with a json file extension and a mime type of application/json
. Currently libmagic is not correctly detecting these file types and staging subjects have been uploaded as txt files with a mime type of text/plain
.
We can set workflow configs to load a particular subject viewer, but I still would like to do validation on the expected json structure so we don't attempt to render something that has something wrong with its data. We typically expect text file subjects to be rendered just as plain text and are typically transcription projects, not data that should be plotted.
We'll need to think about how best to implement this. Presumably we'll need to check the filenames for a .json
extension.
I think we have three options:
- Add a list of known file extensions/mime types. A lot of people seem to be having trouble installing libmagic, so maybe it would be best to only use it as a fallback if the file extension is unknown.
- Specifically add an exception for JSON. i.e. if the type is
text/plain
, check if the filename ends in.json
. - Add a way to manually override the mime type.
What do people think?
The problem files for the TESS project have a .txt
extension, so we should try this with .json
and see if that extension causes problems. I think it's correct behaviour to have text/plain
when the extension is .txt
.
I think option 1 the better option then falling back to libmagic if it's installed. Looks like mimetypes
package? https://docs.python.org/3/library/mimetypes.html#module-mimetypes
I’ve run into this again today for SLSN. My workaround is to explicitly set the MIME type and file contents (apologies for my terrible Python):
subject.locations.append('application/json')
json_data = open('data/subject-1234.json', 'rb')
subject._media_files.append(json_data.read())
json_data.close()
This same problem also occurs with .svg
files. They’re converted to .txt
.
The Python CLI uses subject.add_location
to add file names from a manifest to an upload, which also runs into this bug when libmagic
generates the wrong type.