hoaproject/Mime

Support for Microsoft Office Open XML MIME types

limenet opened this issue · 15 comments

Thank you for providing this useful tool!


I noticed it doesn't support the Microsoft Office Open XML MIME types (i.e. the files with extensions docx, xlsx, pptx, ...) which are pretty common in an application where users can upload files. Do you plan on supporting them?

Further information

@limenet You're welcome :-).

No problem to support MS Office Open XML MIME types. Would you prefer to make a PR? I think all you have to do is to edit the Mime.types file.

Feel free :-).

Sounds great, I just wanted to be sure I wasn't going to mess with anything. :)

@Hywan do you want me to add all new MIME types, i.e. replace the current Mime.types with the one from the Apache SVN or just add the MS Office Open XML MIME types?

@limenet Hmm, good question. I think we need a diff between the two versions because we have added some MIME types that are not present in the Apache file I think.

@Hywan Okay, I'll have a look at it and do my best to merge the Hoa Mime.types and the Apache Mime.types file.

Just FYI, here's what I did (I admit it's pretty dirty, but it did the job):

  1. Create a file m2 which has two parts: (1) the old Mime.types from Hoa and (2) the Mime.types from Apache
  2. I wrote a quick PHP script (see below) to parse the m2 file and do some basic heuristics to find out which duplicate should be kept.
  3. And the result is the new Mime.types (named m3) file which is a merge of both files
<?php
$mt = [];
$contents = [];
$clean = [];
$problems = [];
foreach (file('m2') as $line) {
    $m = explode("\t", $line);
    $t = trim($m[0]);
    @$mt[$t]++;
    @$contents[$t][] = $line;
}

var_dump(count($mt), count($problems), count($clean));

foreach ($mt as $type => $count) {
    if($count>1){
        $problems[$type] = $contents[$type];
    }else{
        $clean[$type] = trim($contents[$type][0]);
    }
}

var_dump(count($mt), count($problems), count($clean));

foreach ($problems as $type => $conflicts) {
    if(count($conflicts) > 2){
        echo 'crap';
    }else{
        $e1 = explode("\t", $conflicts[0]);
        $e2 = explode("\t", $conflicts[1]);
        $ext1 = @$e1[1];
        $ext2 = @$e2[1];
        if(empty($ext1)){
            $clean[$type] = trim($conflicts[1]);
            unset($problems[$type]);
        }elseif(empty($ext2)){
            $clean[$type] = trim($conflicts[0]);
            unset($problems[$type]);
        }else{
            if(strlen($conflicts[0]) > strlen($conflicts[1])){
                $clean[$type] = trim($conflicts[0]);
                unset($problems[$type]);
            }else{
                $clean[$type] = trim($conflicts[1]);
                unset($problems[$type]);
            }
        }
    }
}

var_dump(count($mt), count($problems), count($clean));

var_dump($problems);
ksort($clean);
file_put_contents('m3', implode("\n", $clean));

If you want me to, I can document above code and make it more efficient and nicer, no problem.

@Hywan please see PR #16

@limenet Why not using the new file verbatim?

@Hywan You wrote

I think we need a diff between the two versions because we have added some MIME types that are not present in the Apache file I think.

Based on that I assumed the current Mime.types file is the original Apache 2.0 Mime.types file + a few modifications. Since I don't know what these modifications are, I merged the current Apache 2.5 Mime.types file and the Hoa Mime.types file to ensure no MIME types got lost (i.e. took the union of the two sets of MIME types). I admit it would've been easier to diff the Hoa Mime.types against the Apache 2.0 Mime.types and then use the intersection of the two sets.

Yup, that's the correct way. So, everything is fine :-).

Great :)

Can we close the issue now :-)?

Of course! :)

The library is not tagged yet, so use @dev in your composer.json while waiting the next snapshot :-).

Sure, thanks!