slub/dfg-viewer

Third party contents violate privacy

Closed this issue · 7 comments

The DFG Viewer currently uses contents from code.jquery.com and fonts.googleapis.com, so the owners of those websites get a full track of all page views.

I suggest to use a local copy of the JQuery code and either a standard font or a local font to fix that privacy issue.

I’d agree to take a look at the Google Font but disagree with the jQuery CDN use. The whole purpose of the CDN is that it can be heavily cached by the browser. So after the first request on any site that uses the same jQuery file it can be cached by the browser. Afterwards there can’t be much tracking going on because the CDN doesn’t even get a request anymore. Hosting it locally would be a heavy downside on performance.

(Sorry about closing and re-opening the issue. I accidentally hit the button prematurely on the mobile site.)

I just did a simple test by looking on some book with the DFG Viewer while observing the network traffic with Wireshark. Although contents was cached by the browser (Firefox in my test) and taken from the cache, the third party URLs were addressed, probably to see whether the original contents changed in the meantime. That little traffic is sufficient for the external side to see the referrer URL, thus tracking my actions.

Hosting jQuery locally would have very little effects on performance (if at all):‌ the browser could also cache it, and it would save the DNS lookup for the third party host.

How many requests did you made to verify this?

If I recall correctly the first call should get a 200 response and the file. The next one should get a 304 (Content not modified) with an expiration date in the „far“ future.
After that one no request should be made until that date (except any browser cache gets cleaned).

Cannot verifiy that myself right now though but might take a look later.

Some more experiments show that the behavior seems to depend on the browser software:

  • Firefox ESR (Debian GNU Linux) always accesses the third party site (bad for privacy).
  • Latest Firefox (Windows), Chromium (Debian GNU Linux) don't, but use cached data (fine for privacy).

I'll run more tests with Safari and other browsers later.

How many requests did you made to verify this?

If I recall correctly the first call should get a 200 response and the file. The next one should get a 304 (Content not modified) with an expiration date in the „far“ future.
After that one no request should be made until that date (except any browser cache gets cleaned).

Cannot verifiy that myself right now though but might take a look later.

Yeah and for the 304 a query is sent to the CDN... I for myself always use local files if the license is allowing that (which is most often the case) to avoid giving an unnecessary amount of data to third party providers.
I think jquery isn't that big, that this should be a problem...

Edit: Added Picture that verify´s this. JQuery is not very much of the data that is being transferred...
Screenshot_2020-02-12_07-51-22

albig commented

Patches are welcome!

albig commented

This is fixed by #155.