decrypto-org/spider

Base64 Filtering

Closed this issue · 1 comments

I am currently filtering all possible base64 strings out. However, this seems too harsh, since many pages I found send their CSS via a base64 encoded string within the HTML file. This is a problem since one can also embed base64 images within a CSS. I found that this technique is also encouraged on some web forums as this seems to be a very efficient way of loading the images into the DOM (Only on RTT, cached, only about 30% extra bandwidth). Please see the corresponding commit for the current implementation (in the network:testBase64).

As discussed, we will Whitelist textual representations of base64 strings, such as CSS, HTML or javascript.