GDPR compliance
psimsa opened this issue · 10 comments
Opening a thread to discuss and follow progress
This should be settled once and for all, preferably by someone who actually knows how this works.
As far as I can tell from the code (@kzu correct me if I'm wrong), currently the telemetry (hashed email) that's processed without consent does not really get stored anywhere, it's used in URL to get / not get a 404 response code. The data it's checked against is collected with reversible consent (by installing / uninstalling the github app - please confirm uninstalling the github app also removes the email from the CDNs). To me, that's acceptable. But just like most people commenting, I'm not a privacy law consultant.
Yes, that is the case. Even suspending the GH app causes all blobs to be deleted.
But keep in mind that even a 404 via Azure CDN/Blobs will leave an HTTP request telemetry somewhere, which is what I think folks were most concerned with (at least, that's my understanding of what the problem was with the "phone home"). It's indirect data collection even if it weren't intentional.
I would however expect Microsoft/Azure has this handled in a GDPR-compliant way. As in they log what they're allowed to log and not more. Technically, I'd say that if 3rd party infrastructure collects and stores data they aren't allowed it's not really your liability. But again, not an expert.
The fact the hash is part of the url might be a problem. But if it's established that using the hash to verify membership is acceptable use, sending it in body of POST would solve that. Again, someone with actual knowledge from somewhere else than chatgpt and YouTube has to confirm.
As added context: https://github.com/moq/moq/issues/1395 (which has been completed as of a couple days ago too).
Hey!
I am not a lawyer and this is not legal advice (and this could also be wrong as far as I am concerned), but I am very familiar with GDPR issues. Unfortunately I believe this is not settled yet, personal data is serious business under GDPR :) I've seen a lot of people say incorrect things in other issues so I think it's important to clarify stuff a bit.
First of all, the fact that data is stored or only "manipulated" does not change the extent of the applicability of GDPR, as specified in paragraph 2 of GDPR article 4 (the definition of processing is super broad, notice that it does not even include that data must leave the client). It does obviously make rights to access, removal, portability, etc, inapplicable, but the legal basis for the processing itself still needs to be within the bounds of GDPR, specifically GDPR article 6.
Then, a very important misconception is that the hash of an e-mail address does not constitute personal data. This is not the case, the hash of an e-mail address is personal data. Paragraph 1 of GPDR article 4 defines what personal data is. Notably, the hash of a person's e-mail address constitutes an unique identifier derived from a single e-mail address and thus generally from the identity of a single person, and as such is personal data (as it uniquely identifies someone). And, of course, as many people have pointed out already, hashed e-mail addresses are quite easy to revert. But even if it was impossible to revert, it would still be personal data (arguably, less or equally sensitive personal data, so it's a good thing that you do it but it does not lift liabilities).
Generally, I think by design GDPR prohibits you from checking that somebody is a sponsor or not using their personal data for the purpose of delivering personalized advertisement without their consent, which seems to be the intended purpose of this project. While I think the responsibility of this treatment is on the data controller (the library authors that use SponsorLink), if your library transfers personal data to your server on behalf of the data controller, you immediately must become a data processor. What a data processor is is governed by GDPR article 28, and that is actually a bit non-trivial. Notably, you need to establish a document that lists the complete extent of personal data processing you will carry out so the data controller can be legally bound to it. This is notably important if you send data to servers outside of the European Union, even if temporarily, because restrictions can be extra strong in that case. Because I believe you no longer call your server in the current implementation, this seems to not be a problem for the future.
Arguably, it's a little bit weird to provide a library for which the main intended purpose is not GDPR-compliant1. As a side note, I've seen you have used it before as part of one of your libraries so that could be an issue.
In any case, current and prior data breaches must be reported by the data controller to data regulators2. I think this is so minor here that there would be no consequences, but it's technically necessary. It must normally be done within 72 hours but I'd expect authorities to be reasonable here, considering you have very little means. I've never had to do this myself so I'm not super sure about the procedure. Because you are not based in the European Union, I believe you may need to contact the regulators of all member states containing an affected data subject (yikes). As it is difficult to assess exactly which country each data subject is from, maybe contacting a single one could suffice, you should ask them.
This may all sound very scary but it's not hard to comply at all. As an engineer, you can just see it as a new set of rules on top of your programming language. The rule of thumb is: if you touch a variable that contains data provided by or related to an user in code that you host, make sure 1. you are allowed to do it under GDPR article 6, 2. you can easily provide all the rights defined in GDPR chapter 3 to the user (there aren't that many, it's mostly about access, correction, deletion, portability), 3. you only send it to a 3rd-party if you know what they'll do with it in a contractual way and with the consent of the data controller, 4. you state all of this somewhere public in a contractual way.
If you're worried about this and want actual regulatory advice, I suggest contacting the British regulator ICO. While they are not in the EU anymore, UK GDPR is still a thing and is mostly identical. Alternatively, as I see you live in Argentina, perhaps the Spanish regulator AEPD could be of help, in Spanish. Member state regulators are very helpful and one of their missions is to help you be compliant.
Let me answer this unrelated point of the conversations too, as I think this is valuable information.
I would however expect Microsoft/Azure has this handled in a GDPR-compliant way. As in they log what they're allowed to log and not more.
I do not know the specifics of what Microsoft has done with respect to this, but be careful that they might assume in their data processor contract that you are never sending them personal data (it's silly, but they could). You'd have to check, but they can legally make you responsible for this if they want to. More reasonably, they could expect you to book-keep that yourself. Normally, when a data subject has a data processing complaint, they contact the data controller, which will then forward any potential change to the data processor. I'm not actually sure if data subjects can contact processors directly, but controllers definitely need to be able to do it by themselves. I know it's a bit difficult because that means you have to do further data processing to avoid doing bad data processing, but it's not the end of the world either.
If you have any further questions, I can try to clarify them to the extent of my abilities!
Footnotes
-
One could claim that this is legitimate interest. For the sake of preemptively countering that point, the promise of a module you download is generally not to deliver personalized ads in your warning output but to provide an unrelated technical service defined in context. Legitimate interest cannot apply. ↩
-
This part is governed by GDPR article 33, and mentions specifically "personal data breach." GDPR defines a personal data breach as 'a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed,' so I'll leave it to your interpretation if this constitutes a personal data breach. In my personal, non-lawyer, non-binding opinion, it is, as personal data was transferred to a server where it should never have been. ↩
I've moved to signed local manifest and are no longer storing any user/sponsor information and are adding the ability to remove all user traces on demand too (via the gh sponsors remove
command).
I'll put up a privacy.md
doc on the root of this repo anyways, it's required and necessary.
@Moxinilian thanks for the write-up. As I'm getting closer to a v2, I'd definitely appreciate more eyeballs on the new implementation which hopefully fixes all the raised issues and concerns 🙏
Closing for now since this seems to have been answered properly and no more feedback has been received.