schollz/howmanypeoplearearound

What is legality of monitoring traffic for mac addresses

kootenpv opened this issue · 22 comments

Hey, nice job on this :) I just wanted to mention that in case you are not aware, it is against the law to sniff packets. The only exception is to sniff on your own network, and only to protect it.

It is unfortunate, otherwise it would be really nice to come up with ways to use it!

At least you should put a "big fat" warning that the use of sniffing is most likely be illegal.

Hi @kootenpv! Thanks for that. Do you know any legal precedent that this type of sniffing violates Section 18 U.S. Code § 2511?

I don't need legal advice. I'm just curious whether "to intercept...electronic communication" applies to this since it is only looking at mac addresses and signal strengths and not actually investigating any of the packet contents (meta-data vs data?).

I see that Google got in trouble for something like this in 2013 but Google went so far as to collect fragments of data (sometimes email/passwords) which is more the definition of "electronic communication" than mac addresses and signal strengths.

Hi @schollz. The fact that you are not concerned does not mean you shouldn't put out a warning for people using your software if there is risk involved :)

Some people might live under other laws and they might be put in "danger".

I understand what you're saying w.r.t. meta-data vs data, and I do believe it is a valid point.

Very interesting to read your sources!

Added: 28cac50

Some people might live under other laws and they might be put in "danger".

Absolutely, I totally agree. I never thought about this but I'm also going to add a notice to https://github.com/schollz/find-lf.

I also put this question to the wireshark community.

IANAL, but if you walk down the street with your smartphone in your pocket with Wi-Fi turned on, you will be "intercepting" thousands of packets from networks that you have no right to connect to.

@kootenpv just interested to know what the distinction is between MAC addresses and SSID's, such as are required for https://github.com/kootenpv/whereami, where you do not currently warn your users that sniffing this traffic may be illegal?

UK readers can find a brilliant legal summary, and advice on making MACs anonymous, from the UK Information Commissioners Office (ICO) https://ico.org.uk/media/for-organisations/documents/1560691/wi-fi-location-analytics-guidance.pdf

The 'howmanypeoplearound' situation is specifically dealt with.

Thanks to https://sites-dacb.vuturevx.com/110/3347/landing-pages/ico-produce-wi-fi-analytics-guidance.asp for way into the subject.

Maybe I was too paranoid, though I think it is an interesting discussion.

@ansell I did not think about it yet, although the situation is a bit different. Receiving/counting probes of people (this repo) is different from probing access points. As long as your phone is not functioning as an access point (hotspot/tethering), it will not be involved by whereami/FIND.

@cornishExile That's a fantastic link!

I think that the line is drawn if the WiFi data would be used for singling out individuals, as opposed to be collected as statistics: that would most likely be okay?

Another nice advice in the paper is to convert MAC to something non identifiable.

@kootenpv yes, it's interesting, isn't it?

Personal data acquisition is OK so long as it has a purpose or benefit to the individual (or group). But I wouldn't worry too much - given that web page adverts can install trackers on devices and lookup Facebook demographics, the bar seems pretty high to me (subjectively speaking).

Some basic privacy questions should be asked of a specific use: "what is the purpose of the data collection?", "what is the benefit (both for individuals and groups?", Is the data secure (eg 'hashing' protects in case of theft or loss)?, and how is data destroyed after use?

I don't think mac addresses collected by this means has any inherent privacy issues. The famous case in the UK of going too far is here http://www.cbsnews.com/news/uk-bars-trash-cans-from-tracking-people-with-wi-fi/ but even then the project was cancelled because of political and social pressure, not compliance issues per se.

Everyone keep developing please!

Sounds like a case closed to me :)

Personal data acquisition is OK so long as it has a purpose or benefit to the individual (or group). […] I don't think mac addresses collected by this means has any inherent privacy issues.

Actually, collecting MAC addresses in the EU now falls under the GDPR regulation, as personal data. Indeed, a MAC address is directly associated with a device and their owner can easily be identified (uniquely).

Therefore, collecting and storing MAC addresses requires explicit consent from users on the network. Pseudonymizing MAC addresses is also not enough.

@cynddl Like most people in this thread, I am not a lawyer. There's a clause for being engaged in personal versus commercial pursuits in GDPR.

According to https://www.itgovernance.eu/blog/en/does-the-gdpr-apply-to-me:

The one caveat to that that the GDPR does not apply to people processing personal data in the course 
of exclusively personal or household activity. This means you wouldn’t be subject to the Regulation if 
you keep personal contacts’ information on your computer or you have CCTV cameras on your house 
to deter intruders.

To fall within the remit of the GDPR, the processing has to be part of an “enterprise”. Article 4(18) of 
the Regulation defines this as any legal entity that’s engaged in economic activity. You must be careful 
not to mistake business conducted from home for household activity.

Pseudonymizing MAC addresses is also not enough

@cynddl would hashing them be sufficient? I'm assuming the problem here is to just count the number of unique ones.

Hashing is generally not a good solution because:

  1. hashing does pseudonymize the MAC addresses, but does not anonymize them. If an adversary knows the algorithm you use to hash MAC addresses, they can iterate through all possible MAC addresses until they find the one that matches a hash,
  2. even if the hash cannot be reversed, an attacker knowing you were the only person in the office at 7am will then learn what is your hashed MAC address, and be able to track when you go in and out, or move through the office (if multiple endpoints are used). This is typically why pseudonymized location traces are often not anonymous data.

@cynddl does this mean that we can not use this tool to understand traffic in say, a restaurant (with the restaurant/network owners consent)? Would you think there is any way that this tool could be implemented in such a use case?

It was my understanding that GDPR encourages pseudonymization of data; a mac address is in that regard already pseudoanonymous.

@mwargan If you use such a tool inside the EU, e.g., to monitor traffic in a restaurant, you must either:

  1. obtain consent, prior to data collection, for collecting MAC addresses
  2. aggregate/anonymize the counts immediately when collecting MAC addresses.

If you want to count the exact number of devices connected to a network, you don't need to store MAC addresses. If you want to count the number of devices over, let's say one hour or one day, the naive solution (not GDPR compliant) would be to store the MAC addresses. Privacy-preserving tools for counting distinct elements do exist. See for instance how Tor Metrics estimate the number of IP addresses connected to Tor relays.

Finally, GDPR does encourage pseudonymization, but also clearly consider pseudonymized data as personal data. It helps reduce the risk of data being stolen or misused, but does not grant GDPR compliance per se.

If you want to count the exact number of devices connected to a network, you don't need to store MAC addresses. If you want to count the number of devices over, let's say one hour or one day, the naive solution (not GDPR compliant) would be to store the MAC addresses.

@cynddl Thanks for the links! Maybe this is more of an analytics problem, but I don't see the difference between the two. Could you explain what you mean? If I collect the number of devices at a given timestamp, or collect the individual MAC's and then calculate the device count at a given time, I end up with the same result. I don't see how storing a MAC address could be beneficial, as in either case I can get the devices over one hour or one day.

First case: every second, you collect the list of MAC addresses, then compute the length of that list and store to disk timestamp, # of unique addresses. There's no personal data stored.

Second case: every second, you collect the list of MAC addresses and store it to the disk with a timestamp. Every night, you load all the lists in the past 24h, compute the list of unique addresses and store the length of that list to the disk. You then delete the timestamped lists of MAC address. Still not GDPR compliant because on the disk, you have MAC addresses.

@cynddl oh sure, I understand that! Just curious as to why it could be beneficial to store the MAC address in the first place - from a marketing/analytical point of view, it provides no added benefit as I see it.

I'm thinking of pushing a pull request with maybe a flag option --gdpr to automatically run the script in a GDPR compliant way. Would you be in on helping out?

Edit: not a flag, but I did create a fork and comment out the MAC for GDPR line 250: mwargan@9c716bd - would this be compliant?

So, I saw this thread and I had to run my mouth about it. @kootenpv it is 100% legal in the United States to sniff in Monitor Mode under 18 USC 2511. The caveat is that you may not attempt to decrypt anything encrypted.

The GDPR can blow itself. Worse piece of European legislation ever written and I'm proud to say that I willingly violate the hell out of the GDPR's wifi rules. They hath no power in the USA.

Old issue, but perhaps still nice to know.

A municipality (Enschede, Netherlands) got fined 600.000 euro for using MAC addresses to do crowd monitoring. The MAC addresses where not only hashed, but also salted and stored for limited time duration. According to the Dutch Data Protection Authority it doesn't matter because the company might know the hash function and salt and can bruteforce stuff.

I don't know how to it "right". It's not that homomorphic encryption or functional encryption would solve this. I think it must go in the direction of the following:

  • Enforce device manufacturers to allow people to set a rotation scheme for their phones (or other devices) and make sure device manufacturers use privacy-sensitive defaults. Similar to the exposure notification framework for covid, but then with more legal pressure.
  • Enforce a "privacy bit" in any protocol that does/requires broadcasting (of course this can a bit, opcode, etc, all depending on the protocol). This bit must be user-configurable: the user must have the final choice.
  • Optionally, prescribe a "report bit". If set, people have to be updated that they have been tracked. For example, in Estonia you get a notification if a police officer scans your number plate. Naturally, there's then a notification mechanism in-place which is on itself "vulnerable" to legal compliance.

In general it wouldn't be so weird to think that there will be a transition towards privacy-aware protocols just as we saw with the rise of more security protocols (even if they incur overhead).

@mrquincle very interesting... This article said " It does let us know that the method has now been adjusted, so that data, among other things, is not stored for longer than 24 hours." - was their data being store for more than 24 hours before?

Is this is also a function of time - i.e. maybe the 60 second scanning time wouldn't be subject to this issue?

If you can read Dutch here the Dutch Data Protection Authority describes some of the conditions (such as that a hash and a salt are insufficient).

The company behind this had this statement. Here they stated that they do this since 2017 when there was no regulation for this. Since a year they are not storing for longer than 24 hours.

They state that the fine was based on the behavior before that time. They have a privacy policy online targeted to exactly this application. This mentions:

  • The ability to count the number of unique passers-by per day. Hence, they have to track over the full 24 hours for this.
  • The average stay per visitor per day. Also, for this they have to track over the full 24 hours.

Hence, I think you're right. If you're scanning for short enough times it seems to be okay.

Opt-out
Note that they have a very interesting opt-out function, that requires you to give them your MAC address: https://www.citytraffic.nl/opt-out-register/

mac_wifi=$(apg -a 1 -M nl -m12 -n 1 -E ghijklmnopqrstuvwxyz | sed 's/../&:/g;s/:$//')
echo $mac_wifi
curl -X POST -F "mac-wifi=$mac_wifi" https://www.citytraffic.nl/wp-json/contact-form-7/v1/contact-forms/1030/feedback 

You'll get a response like

{"into":"#","status":"mail_sent","message":"Bedankt voor de afmelding; het is verzonden.","posted_data_hash":"578195650c08e4fc146b6d6c165fa637"}