wingman-jr-addon/wingman_jr

Some Downloads not work anymore

Opened this issue · 15 comments

On rare pages downloads not work anymore with Wingman Jr. (my current is 3.3.6) enabled (Firefox v. 115.8.0 esr).
E.g. here: https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/

When clicking on a link, instead of a download the website shows cryptic content [STRG+S would be NOT the download]. This would have to be fixed urgently. Can you do this?

I think I'm facing a similar issue with Wingman Jr. extension (v 3.3.6) on Firefox 123.0.1 (64-bit) on macOS.

If I disable this extension, I can download the image from https://dl3.pushbulletusercontent.com/yadayada.jpg (yadayada is not the actual filename obviously) fine.

Thanks for the report @Dragodraki @SufianBabri .
@Dragodraki I think I can reproduce what you're talking about for the ISO's there - for example https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img
@SufianBabri I'm getting the following XML even without Wingman Jr.:
<Error> <Code>AccessDenied</Code> <Message>Access denied.</Message> </Error>
I'm assuming that's not what you'd expect?

At any rate, I'll take a peek and see if I can figure out what's going on.

Working on bisecting:

  • 3.3.6 - As Dragodraki described, likely just not triggering download
  • 3.3.0 - Same as 3.3.6 but not correctly replacing some of the characters
  • 3.0.0 - Same as 3.3.0
  • 2.0.1 - Same as 3.0.0

So, at least this doesn't seem to be a recent regression. My guess is its something to do with the content type handling for e.g. octet-stream and friends but we'll find out.

Ok, so the core issue seems to be that the .IMG isn't serving up a Content-Type at all. Here's what Wingman does:
image

I found a similar site, with similar type of content: http://ftp.vcu.edu/pub/gnu_linux/archlinux/iso/2024.04.01/
Here's what it does instead:
image

The application/octet-stream will trigger it to download.

So, big picture: Wingman Jr. has to do things around Content-Type so that it can properly translate/pass through characters. To do this, it has to force a specific content type. However, in this case, the original Content-Type is not specified and so then Firefox presumably runs its own smarter content type detection and determines it should download.

Now that might be the problem, but I'm not sure what the solution should be yet.

As a further resource for later, the "what to do when type isn't specified" can get complex, see for example https://mimesniff.spec.whatwg.org/#identifying-a-resource-with-an-unknown-mime-type

Thanks for the report @Dragodraki @SufianBabri . @Dragodraki I think I can reproduce what you're talking about for the ISO's there - for example https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/distributions/1.0/fdboot.img @SufianBabri I'm getting the following XML even without Wingman Jr.: <Error> <Code>AccessDenied</Code> <Message>Access denied.</Message> </Error> I'm assuming that's not what you'd expect?

At any rate, I'll take a peek and see if I can figure out what's going on.

If the website does not work for you, maybe is is about regional limitation (geo-lock) - with VPN or TOR you should bypass that. Sorry I cannot give another example right now.

With Wingman Jr enabled the website shows trys to display download content as plain text instead of offer download - here is an excerpt of the weird characters:

"ë<�LINUX4.1����à@�ð ��)ã�úD FAT12 úü1ÀŽØ½|¸à�ŽÀ‰î‰ï¹�ó¥ê^|à�`ŽØŽÐ�f û€~$ÿu�ˆV$ÇFÀ�ÇFÂ�èéFreeDOS‹v�‹~��v�ƒ×‰vÒ‰~ÔŠF�˜÷f��Æ�׉vÖ‰~Ø‹^�±�Óë‹F�1Ò÷ó‰FÐ�ƃ׉vÚ‰~Ü‹FÖ‹VØ‹ÐÄ^Zè›r/ÄZ¹�¾ñ}Wó¦_&‹E�t�ƒÇ &€=uçrYPÄ^Z‹~�‹FÒ‹VÔèk"

Normally it has to look like this (when disable the addon it does):
Screenshot 2024-04-07 140359

Yep, thanks @Dragodraki that is what I see too for your example. As noted above, the root of the issue is that the website doesn't send a Content-Type. Wingman Jr adds one, but then that means that Firefox can't use its own logic to properly infer a Content-Type. Usually downloads have a type of "application/octet-stream", but in this case Wingman Jr is incorrectly inferring a text type. I'm still trying to think the best way to actually fix this.

So mulling on this a bit further, apart from thinking how to modify the code the currently exists, there is still a core conundrum. For documents that don't supply a Content-Type, the default behaviour of browsers is to treat some as documents and some as downloads (as noted previously). Ones that are treated as documents should definitely be scanned, but ones that are not should probably go through the normal download process. However, in order to know which are which, Wingman cannot just defer the guessing process to Firefox. So, the only viable path through is to actually implement Content-Type sniffing. However, I do wonder how thorny this will get: while the stated standard algorithm is complex, I'm concerned that browsers may add their own extra detection logic on top and that forever I'll be reverse engineering that logic; still, it's probably better than the current state.

With respect to implementation, then, there's definitely work to do. Right now only the charset is sniffed, and that is in part dependent on Content-Type detection happening as a prerequisite. Now the case where Content-Type itself is sniffed has to be handled, and early return on the logic based on Content-Type appearing in the header won't suffice, so it could get yucky. However, at least it's clear from the above that Content-Type sniffing must be reimplemented for the rest to work correctly.

Both this and #201 are similar in that the API to do the scanning doesn't really expose enough of what the browser is doing and both require essentially re-implementing a core and complex part of what the browser itself does.

I don't know if this would be the best way of implementing this, but you can have a look at the file command source code. It has magic definitions for pretty much every type of thing imaginable. It uses a "formatting" language to read some specific configuration files to use for checking the type of input. They have support for getting the name, mime, and ext of the data. I doubt you would want to try and write your own interpreter, but with the name being so generic I could not find any libraries to use this in JavaScript.

First of all, thanks to everyone for helping in doing comments and providing details related to this issue.

@wingman-jr-addon
The addon don't have to be perfect. Like for so many things in life, perfection should be the aim but not the reality. Maybe you can make Wingman Jr. a little bit smarter interpreting the content type but only as long the results are worth it.
If you find a solution for the more common content types - like in my example - I will be fully satfisfied (you might close the issue then).

Thanks @Dragodraki @arthurmelton . I did finally fix #201 , and I think this has some similarities to it, with a key difference being that I can't "sniff" the MIME type after passing on the start of the request, I have to "sniff" before - but preferably only in the case when no MIME type is specified. So, I can probably implement just 7.1 from here: https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm

Unfortunately, here is another sample of a failed download having Wingman Jr enabled (v.3.4.0):

It shows the special character thing again here or loads indefinitely.

Thanks @Dragodraki , I still have not tried to tackle this but having known failures to try makes development easier so thanks for reporting.

Everytime, and thanks for your continous support in this addon. It's great improvement over the last years is outstanding! Take your time.

Ok, so I'm continuing to noodle on this. Notes for myself regarding architecture:

  • bkDirectTypedUrlListener and bkBase64ContentListener should probably merge
  • General flow will be to check the following in order:
  1. If Content-Type present and image, fall back to normal image listener
  2. If Content-Type present and some HTML type, do base-64 listener.
  3. If Content-Type present but none of the above, don't filter.
  4. Otherwise sniff for MIME type per 7.1 as noted previously.
  5. If sniffed MIME type is present and image, tap into normal image listener. This could get a little tricky since we've already received data, but could maybe simulate by sending the first packet received in.
  6. If sniffed MIME type is present and HTML type, tap into normal base-64 listener. Similar to above, may have to simulate normal flow by forwarding on the first packet.
  7. If sniffed MIME type is other type or unknown, treat as download.