DSCI 541: Privacy, Ethics, and Security

The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.

Learning Outcomes

By the end of the course, students will be able to:

  1. Identify situations in which data is sensitive, assess the risks, and articulate a reasoned response.
  2. Identify the pros and cons of situations in which data was collected for one purpose and later analyzed for other purposes.
  3. Identify trade-offs in security and privacy.
  4. Apply ethical theories to case studies. Consider privacy, human dignity, harm, the public good, legal issues, the role of ethics boards, and consent.
  5. Implement good security and privacy practices in data storage, use, and reporting.
  6. Explain why good security is not a product, but rather a process and a mindset.
  7. Argue for why security is complex and difficult, and why perfect security may be unachievable.

Schedule

# Topics Reading
1 Privacy; types of sensitive data; assessing risks; anonymization, de-identification, and re-identification; linkage of datasets; k-anonymity; l-diversity; functional dependencies. After this class, for homework, read Bruce Schneier's "Secrets and Lies": pages 29-41 (from the heading "Privacy Violations" to the end of Chapter 3), and then read pages 59-81 (all of Chapter 5).
2 The notion of security as a process and a mindset; the weakest link in a chain; security as a moving target; complexity and security; security terminology and examples; examples of privacy or ethics issues. Before this class, finish last day's readings. After this class, for homework, read pages 84-125 (i.e., cryptography: all of Chapter 6 (and its preceding page), and all of Chapter 7 stopping at the heading "Security Models").
3 Cryptographic hash functions; symmetric and asymmetric encryption; mmore security concepts; examples of privacy or ethics issues. Before this class, finish last day's readings. After this class, for homework, read pages 135-147 (Chapter 9 stopping at the heading "Authentication Protocols"), and also read pages 225-239 (all of Chapter 15: "Certificates and Credentials").
4 Continuation of cryptography and security concepts; access control; passwords; MACs; more examples of privacy or ethics issues (e.g., CS admissions case study). Before this class, finish last day's readings. After this class, for homework, read pages 167b-201 (i.e., all of these short chapters: all of Chapter 10 starting with the subsection "Web Security", then all of Chapter 11 ("Network Security"), and then all of Chapter 12 ("Network Defenses")).
5 Case study (cont.); digital signatures; trust; public-key cryptography (PKI); RSA; other security topics. Before this class, finish last day's readings. After this class, for homework, there are two videos that you need to view about topics in information ethics, in this order: (a) https://www.youtube.com/watch?v=NesTWiKfpD0 ("The End of Privacy" by Michal Kosinski), and (b) https://www.youtube.com/watch?v=n8Dd5aVXLCc ("The Power of Big Data and Psychographics" by Alexander Nix of Cambridge Analytica).
6 Ethics, legalities, and privacy (Philosophy guest lecturer: Dr. Orlin Vakarelov): ethical issues; privacy as theft; human dignity; information ethics; what is legal vs. what is ethical; social good vs. individual harm, European Union's General Data Protection Regulation (GDPR). Before today's class, be sure to watch the two videos assigned last day. After this class, for homework, view the video presentation "Show Me Your Data and I'll Tell You Who You Are" by Sandra Wachter: https://www.youtube.com/watch?time_continue=27&v=YYb1Dtc1B40.
7 Ethics, legalities, and privacy (Dr. Orlin Vakarelov): continuation of topics in information ethics; privacy vs. "freedom of information"; GDPR; provincial and federal laws; professional and industry standards of ethics (e.g., codes of ethics); conflicts of interest; whistle-blowing; ethical case studies. Before this class, finish watching the Sandra Wachter video from last day's homework. After this class, for homework, it's only optional reading from "Secrets & Lies": pages 255-269 (all of Chapter 17: "The Human Factor").
8 Digital certificates and certificate authorities; PKI examples; SQL injection; human factors including social engineering, trust, and "usable security"; risk management. Before this class, there is only optional reading.

Labs

For labs 2, 3, and 4 there will be pre-reading. We will post these readings on the Friday before the lab. Please make sure to read the articles carefully, taking notes.

Lab topic Assignment Pre-reading
1 Anonymize and deidentify data; explore how much secondary information you need to identify an individual; identify trade-offs between usefulness of data and having "too much" anonymity. Submit lab work (code, results, reflection piece).
2 Case studies in security and privacy, including real-world breaches. Students come prepared: (a) preliminary reading with brief student notes, and (b) class-wide discussion of 2-3 case studies. For homework, students prepare a 1-2 page report on a case study determined by the teaching staff.
  • Dynamic sound identification
  • An Amazon Echo May Be the Key to Solving a Murder Case
  • 3 Quiz on security topics. More case studies in security and privacy, including real-world security violations, privacy breaches, and other risks. Students come prepared. Class-wide discussion. For homework, students prepare a 1-2 page report on a case study determined by the teaching staff.
  • Hiring by Machine
  • OkCupid Screen Scraping Case
  • 4 Case studies in ethics. Students come prepared. Class-wide discussion. For homework, students prepare a 1-page report on a case study determined by the teaching staff.
  • First 20 minutes of the roundtable by Dr. Michal Kosinski
  • The following video which you will watch for lecture 7 as well.
  • "Show Me Your Data and I'll Tell You Who You Are" by Sandra Wachter
  • And the following video which you already have watched for lecture 6.
  • The end of privacy
  • Lecture Learning Objectives (These depend on how many of the lecture topics we can cover, and are subject to reordering. Some topics may need to be deferred to a following lecture.)

    1. By the end of the lecture, students should be able to:

      • Identify types of sensitive data and assess the risks.
      • Differentiate among anonymization, de-identification, and re-identification. Provide examples where each may be necessary.
      • Describe ways of preventing inference attacks (e.g., cell suppression, generalization, and noise attacks).
      • Explain and apply k-anonymity and l-diversity to a small set of tables. Explain why it may be difficult to determine an appropriate value for k and l.
      • Explain the trade-offs among data utility, privacy, data aggregation, and absolute anonymity.
    2. By the end of the lecture, students should be able to:

      • Identify and explain functional dependencies in a set of database tables. Explain how to mitigate risks associated with them.
      • Argue for why good security is not a product, but rather a process and a mindset.
      • Provide a good working definition of "privacy".
    3. By the end of the lecture, students are expected to be able to:

      • Explain -- and provide examples that clearly describe -- security concepts such as: access control, authorization, availability, confidentiality, data provenance, denial-of-service attack, eavesdropping, encryption, entropy, hash functions (e.g., SHA-256), man-in-the-middle attack, integrity, message authentication code (MAC), metadata, one-time pad, passphrase and password, password salt, pseudonymity, symmetric cryptosystem (i.e., private but shared key, including Advanced Encryption Standard (AES)), token (e.g., magnetic stripe card, smart card), and two-factor authentication.
    4. By the end of the lecture, students are expected to be able to:

      • Explain -- and provide examples that clearly describe -- cryptographic concepts such as: asymmetric (i.e., public-key) cryptography, certification authority, certificate revocation list, digital certificate, key management, non-repudiation, public-key infrastructure (PKI), and random number generators.
      • Explain the strengths of each of asymmetric and symmetric cryptography. Explain why they are frequently used together.
    5. By the end of the lecture, students are expected to be able to:

      • Justify the need for different types of backup and recovery (e.g., full, incremental, point-in-time) for databases, and for individual files and folders.
      • Explain how an SQL injection attack works.
      • Explain how views and database permissions (via SQL) provide security in a database.
      • Explain -- and provide examples that clearly describe -- security concepts such as: base rate fallacy, biometrics, botnets, buffer overflow, challenge response, cookies, firewall, HTTPS, intrusion detection, IPsec, logic bomb, packet sniffing, phishing, proxy, ransomware, secure shell (SSH), spoofing, traffic analysis, Trojan Horse, tunneling, URL obfuscation, virus, and virtual private network (VPN).
      • Explain -- and provide examples that clearly describe -- security concepts such as: CAPTCHA, CERT advisory, penetration testing, physical security, social engineering, vulnerability, and zero-day attack.
    6. By the end of the lecture, students are expected to be able to:

      • Identify the qualities of a good argument.
      • Define "information ethics" and its relevance to a data scientist.
      • Explain how individuals can be harmed by data collection, analysis, and dissemination. Suggest good practices.
      • Justify the role of human dignity and the public good when making ethical decisions in data science. Provide examples.
    7. By the end of the lecture, students are expected to be able to:

      • Analyze codes of ethics with respect to the kinds of problems they try to proactively address. Identify reasons for possible non-compliance.
      • Given a domain-specific situation (e.g., health care, education, government) involving the public release of data, suggest an appropriate balance between privacy and "freedom of information". Provide examples of policies in which "freedom of information" may harm individuals.
      • Provide moral arguments for and against whistle-blowing.
      • Evaluate security standards documents, privacy policies, and terms of use for an institution such as a university or a hospital.
    8. By the end of the lecture, students are expected to be able to:

      • Suggest workable ways to mitigate risk associated with the growing problems of phishing and ransomware.
      • Argue for why social engineering attacks will continue to be problem for securing systems.
      • Explain how probabilities can be used effectively in risk management.
      • Explain why trust and human factors play a big part in security.
      • Define the term "usable security".
      • Provide examples of security policies that are more along the lines of "security theater" than being effective. Suggest reasons for why the public buys into such policies.
      • Suggest ways in which security, privacy, and ethics interact. Discuss the responsibilities of data scientists in these areas.
      • Identify many of the "best practices" in security.

    Required Text

    Secrets & Lies by Bruce Schneier, Wiley, 2015 (15th anniversary edition) or the 2001 edition (about 97-98% the same, including the same page numbers, sections, and chapters). We will read about 60% of this book during the course. It has stood the test of time, and it is a relatively easy read: partly technical, but having plenty of real-world examples. You can also borrow it online, from the UBC Library: (http://gw2jh3xr2c.search.serialssolutions.com/?sid=sersol&SS_jc=TC0001554461&title=Secrets%20and%20lies%20%3A%20digital%20security%20in%20a%20networked%20world)

    Optional but Useful and Recommended

    Computer Security by Michael Goodrich & Robert Tamassia, Addison-Wesley, 2011 (2nd edition forthcoming)

    Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil, Crown (Penguin Random House), 2016.

    Additional General Reference Material

    Even More Resources on Privacy, Ethics, and Security for Data Scientists New to Privacy, Ethics, and Security:

    On Privacy & Security & Ethics

    On AI & Ethics

    More Resources

    Local & Canadian Organizations and Humans on data ethics, privacy and security