pallets-eco/flask-wtf

Validate the MIME type of files using FileAllowed

Closed this issue · 3 comments

The flask_wtf.file.FileAllowed validator is very useful to validate the extension of an uploaded file, however, it is also important to validate the MIME type of the files as a user could easily change the extension of a file to violate this validator, thus breaking the integrity of the application. This is also a recommendation made by the HTML Standard:

Authors are encouraged to specify both any MIME types and any corresponding extensions when looking for data in a specific format.

In my opinion, the best place to specify the allowed MIME types is in the same list that receives this validator (just as the accept attribute of <input type="file"> works), for example:

FileAllowed(upload_set=["doc", "docx", "xml", "application/msword", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"])

I was thinking about how to implement this and came to the following design decisions:

  1. If upload_set is a list, the validator will check that the extension OR the MIME type of an uploaded file belongs to that list. This is the same way the accept attribute I talked about previously works, and is very feasible because if, for example, someone wanted only videos and PDF files, this would suffice:

    FileAllowed(upload_set=["video/*", "pdf"])

    Isn't this elegant? 😊

  2. If upload_set is a dictionary, the validator will check the extension AND the MIME type of an uploaded file. For example, the extensions ["jpg", "jpeg", "jfif", "pjpeg", "pjp"] belong to the MIME type "image/jpeg", now suppose someone wants files with this MIME type but only with the extensions ["jpg", "jpeg"], then:

    FileAllowed(upload_set={
        "image/jpeg": ["jpg", "jpeg"]
    })

    Continuing with this example, if someone wants files with said MIME type and any of its extensions, then:

    FileAllowed(upload_set={
        "image/jpeg": None
    })

    In the latter case, None means "there is no preference for some extensions in particular", however, this does not mean that an uploaded file can have any extension, but rather that it must have one of the extensions belonging to that MIME type, therefore "image/jpeg": None is equivalent to "image/jpeg": ["jpg", "jpeg", "jfif", "pjpeg", "pjp"].

    Obviously, this dictionary may have more MIME types and each one will have its own accepted extensions or None.

The disadvantage of this functionality is that it will likely require the use of an external library such as python-magic or filetype, since the built-in mimetypes module relies on filenames to determine the MIME type, rather than using the contents of the files.

Please let me know if you like the idea and I can do a PR!

Coming late to the discussion, but I think this is a great idea! In fact, I stumbled upon this page because I had exactly this problem myself. Hope one of the maintainers see this soon!

Tedpac commented

Closing this as it did not get enough attention in almost 1 year.