The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.
By the end of the course, students will be able to:
- Identify situations in which data is sensitive, assess the risks, and articulate a reasoned response.
- Identify the pros and cons of situations in which data was collected for one purpose and later analyzed for other purposes.
- Identify trade-offs in security and privacy.
- Apply ethical theories to case studies. Consider privacy, human dignity, harm, the public good, legal issues, the role of ethics boards, and consent.
- Implement good security and privacy practices in data storage, use, and reporting.
- Explain why good security is not a product, but rather a process and a mindset.
- Argue for why security is complex and difficult, and why perfect security may be unachievable.
# | Topics | Reading |
---|---|---|
1 | Privacy; types of sensitive data; assessing risks; anonymization, de-identification, and re-identification; linkage of datasets; k-anonymity; l-diversity; functional dependencies. | After this class, for homework, read Bruce Schneier's "Secrets and Lies": pages 29-41 (from the heading "Privacy Violations" to the end of Chapter 3), and then read pages 59-81 (all of Chapter 5). |
2 | The notion of security as a process and a mindset; the weakest link in a chain; security as a moving target; complexity and security; security terminology and examples; examples of privacy or ethics issues. | Before this class, finish last day's readings. After this class, for homework, read pages 84-125 (i.e., cryptography: all of Chapter 6 (and its preceding page), and all of Chapter 7 stopping at the heading "Security Models"). |
3 | Cryptographic hash functions; symmetric and asymmetric encryption; mmore security concepts; examples of privacy or ethics issues. | Before this class, finish last day's readings. After this class, for homework, read pages 135-147 (Chapter 9 stopping at the heading "Authentication Protocols"), and also read pages 225-239 (all of Chapter 15: "Certificates and Credentials"). |
4 | Continuation of cryptography and security concepts; access control; passwords; MACs; more examples of privacy or ethics issues (e.g., CS admissions case study). | Before this class, finish last day's readings. After this class, for homework, read pages 167b-201 (i.e., all of these short chapters: all of Chapter 10 starting with the subsection "Web Security", then all of Chapter 11 ("Network Security"), and then all of Chapter 12 ("Network Defenses")). |
5 | Case study (cont.); digital signatures; trust; public-key cryptography (PKI); RSA; other security topics. | Before this class, finish last day's readings. After this class, for homework, there are two videos that you need to view about topics in information ethics, in this order: (a) https://www.youtube.com/watch?v=NesTWiKfpD0 ("The End of Privacy" by Michal Kosinski), and (b) https://www.youtube.com/watch?v=n8Dd5aVXLCc ("The Power of Big Data and Psychographics" by Alexander Nix of Cambridge Analytica). |
6 | Ethics, legalities, and privacy (Philosophy guest lecturer: Dr. Orlin Vakarelov): ethical issues; privacy as theft; human dignity; information ethics; what is legal vs. what is ethical; social good vs. individual harm, European Union's General Data Protection Regulation (GDPR). | Before today's class, be sure to watch the two videos assigned last day. After this class, for homework, view the video presentation "Show Me Your Data and I'll Tell You Who You Are" by Sandra Wachter: https://www.youtube.com/watch?time_continue=27&v=YYb1Dtc1B40. |
7 | Ethics, legalities, and privacy (Dr. Orlin Vakarelov): continuation of topics in information ethics; privacy vs. "freedom of information"; GDPR; provincial and federal laws; professional and industry standards of ethics (e.g., codes of ethics); conflicts of interest; whistle-blowing; ethical case studies. | Before this class, finish watching the Sandra Wachter video from last day's homework. After this class, for homework, it's only optional reading from "Secrets & Lies": pages 255-269 (all of Chapter 17: "The Human Factor"). |
8 | Digital certificates and certificate authorities; PKI examples; SQL injection; human factors including social engineering, trust, and "usable security"; risk management. | Before this class, there is only optional reading. |
For labs 2, 3, and 4 there will be pre-reading. We will post these readings on the Friday before the lab. Please make sure to read the articles carefully, taking notes.
Lab topic | Assignment | Pre-reading | |
---|---|---|---|
1 | Anonymize and deidentify data; explore how much secondary information you need to identify an individual; identify trade-offs between usefulness of data and having "too much" anonymity. | Submit lab work (code, results, reflection piece). | |
2 | Case studies in security and privacy, including real-world breaches. Students come prepared: (a) preliminary reading with brief student notes, and (b) class-wide discussion of 2-3 case studies. | For homework, students prepare a 1-2 page report on a case study determined by the teaching staff. | |
3 | Quiz on security topics. More case studies in security and privacy, including real-world security violations, privacy breaches, and other risks. Students come prepared. Class-wide discussion. | For homework, students prepare a 1-2 page report on a case study determined by the teaching staff. | |
4 | Case studies in ethics. Students come prepared. Class-wide discussion. | For homework, students prepare a 1-page report on a case study determined by the teaching staff. |
Lecture Learning Objectives (These depend on how many of the lecture topics we can cover, and are subject to reordering. Some topics may need to be deferred to a following lecture.)
-
By the end of the lecture, students should be able to:
- Identify types of sensitive data and assess the risks.
- Differentiate among anonymization, de-identification, and re-identification. Provide examples where each may be necessary.
- Describe ways of preventing inference attacks (e.g., cell suppression, generalization, and noise attacks).
- Explain and apply k-anonymity and l-diversity to a small set of tables. Explain why it may be difficult to determine an appropriate value for k and l.
- Explain the trade-offs among data utility, privacy, data aggregation, and absolute anonymity.
-
By the end of the lecture, students should be able to:
- Identify and explain functional dependencies in a set of database tables. Explain how to mitigate risks associated with them.
- Argue for why good security is not a product, but rather a process and a mindset.
- Provide a good working definition of "privacy".
-
By the end of the lecture, students are expected to be able to:
- Explain -- and provide examples that clearly describe -- security concepts such as: access control, authorization, availability, confidentiality, data provenance, denial-of-service attack, eavesdropping, encryption, entropy, hash functions (e.g., SHA-256), man-in-the-middle attack, integrity, message authentication code (MAC), metadata, one-time pad, passphrase and password, password salt, pseudonymity, symmetric cryptosystem (i.e., private but shared key, including Advanced Encryption Standard (AES)), token (e.g., magnetic stripe card, smart card), and two-factor authentication.
-
By the end of the lecture, students are expected to be able to:
- Explain -- and provide examples that clearly describe -- cryptographic concepts such as: asymmetric (i.e., public-key) cryptography, certification authority, certificate revocation list, digital certificate, key management, non-repudiation, public-key infrastructure (PKI), and random number generators.
- Explain the strengths of each of asymmetric and symmetric cryptography. Explain why they are frequently used together.
-
By the end of the lecture, students are expected to be able to:
- Justify the need for different types of backup and recovery (e.g., full, incremental, point-in-time) for databases, and for individual files and folders.
- Explain how an SQL injection attack works.
- Explain how views and database permissions (via SQL) provide security in a database.
- Explain -- and provide examples that clearly describe -- security concepts such as: base rate fallacy, biometrics, botnets, buffer overflow, challenge response, cookies, firewall, HTTPS, intrusion detection, IPsec, logic bomb, packet sniffing, phishing, proxy, ransomware, secure shell (SSH), spoofing, traffic analysis, Trojan Horse, tunneling, URL obfuscation, virus, and virtual private network (VPN).
- Explain -- and provide examples that clearly describe -- security concepts such as: CAPTCHA, CERT advisory, penetration testing, physical security, social engineering, vulnerability, and zero-day attack.
-
By the end of the lecture, students are expected to be able to:
- Identify the qualities of a good argument.
- Define "information ethics" and its relevance to a data scientist.
- Explain how individuals can be harmed by data collection, analysis, and dissemination. Suggest good practices.
- Justify the role of human dignity and the public good when making ethical decisions in data science. Provide examples.
-
By the end of the lecture, students are expected to be able to:
- Analyze codes of ethics with respect to the kinds of problems they try to proactively address. Identify reasons for possible non-compliance.
- Given a domain-specific situation (e.g., health care, education, government) involving the public release of data, suggest an appropriate balance between privacy and "freedom of information". Provide examples of policies in which "freedom of information" may harm individuals.
- Provide moral arguments for and against whistle-blowing.
- Evaluate security standards documents, privacy policies, and terms of use for an institution such as a university or a hospital.
-
By the end of the lecture, students are expected to be able to:
- Suggest workable ways to mitigate risk associated with the growing problems of phishing and ransomware.
- Argue for why social engineering attacks will continue to be problem for securing systems.
- Explain how probabilities can be used effectively in risk management.
- Explain why trust and human factors play a big part in security.
- Define the term "usable security".
- Provide examples of security policies that are more along the lines of "security theater" than being effective. Suggest reasons for why the public buys into such policies.
- Suggest ways in which security, privacy, and ethics interact. Discuss the responsibilities of data scientists in these areas.
- Identify many of the "best practices" in security.
Secrets & Lies by Bruce Schneier, Wiley, 2015 (15th anniversary edition) or the 2001 edition (about 97-98% the same, including the same page numbers, sections, and chapters). We will read about 60% of this book during the course. It has stood the test of time, and it is a relatively easy read: partly technical, but having plenty of real-world examples. You can also borrow it online, from the UBC Library: (http://gw2jh3xr2c.search.serialssolutions.com/?sid=sersol&SS_jc=TC0001554461&title=Secrets%20and%20lies%20%3A%20digital%20security%20in%20a%20networked%20world)
Computer Security by Michael Goodrich & Robert Tamassia, Addison-Wesley, 2011 (2nd edition forthcoming)
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil, Crown (Penguin Random House), 2016.
- Abelson, Hal; Ledeen, Ken; and Lewis, Harry. Blown to Bits: Your Life, Liberty, and Happiness after the Digital Explosion. Addison-Wesley, 2008. http://www.bitsbook.com/excerpts/.
- ACM Code of Ethics, 2018. https://www.acm.org/code-of-ethics
- Baase, Sara. A Gift of Fire: Social, Legal, and Ethical Issues in Computer Technology, 4th edition. Pearson, 2012.
- Berman, Jules J. Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information. Morgan-Kaufmann (Elsevier), 2013.
- Electronic Frontier Foundation: Tips, Tools, and How-Tos for Safer Online Communications. https://ssd.eff.org/en
- Floridi, Luciano. The Ethics of Information. Oxford University Press, 2015.
- Geist, Michael. http://www.michaelgeist.ca/blog/ (blog, privacy, ethics, Canadian focus)
- Hamidi, Foad; Scheuerman, Morgan Klaus; and Branham, Stacy M. "Gender Recognition or Gender Reductionism? The Social Implications of Automatic Gender Recognition Systems", Proc. ACM SIGCHI 2018, Best Paper Award. (Automatic gender recognition, gender identity, transgender individuals - related to Lab 2 case study) https://dl-acm-org.ezproxy.library.ubc.ca/citation.cfm?doid=3173574.3173582
- Krause, Heather. Feminist Quantitative Data Analysis. https://app.ruzuku.com/courses/25230/about (This is a short series of thought-provoking videos on diversity, bias, and data analysis.)
- Krebs, Brian. https://krebsonsecurity.com/ (blog on security)
- Quinn, Michael J. Ethics for the Information Age, 7th edition, Pearson, 2017.
- Princeton Dialogues on AI and Ethics. https://aiethics.princeton.edu/case-studies/
- Schneier, Bruce. Crypto-Gram Monthly Blog and Archives. https://www.schneier.com/crypto-gram/ (blog on security, privacy, ethics)
- Schneier, Bruce. Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World. Norton, 2015.
- Schneier, Bruce. Interview in the Harvard Gazette. https://news.harvard.edu/gazette/story/2017/08/when-it-comes-to-internet-privacy-be-very-afraid-analyst-suggests/
- Sweeney, Latanya. Harvard Data Privacy Lab. https://dataprivacylab.org/people/sweeney/
- Topol, Eric. The Patient Will See You Now: The Future of Medicine is in Your Hands. Basic Books, 2015.
- Vallor, Shannon and Narayanan, Arvind. An Introduction to Software Engineering Ethics. Markkula Center for Applied Ethics. Santa Clara University. https://www.scu.edu/ethics/focus-areas/more/engineering-ethics/an-introduction-to-software-engineering-ethics/
- Wachter, Robert. The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine's Computer Age. McGraw-Hill, 2015.
Even More Resources on Privacy, Ethics, and Security for Data Scientists New to Privacy, Ethics, and Security:
-
Privacy's not dead—it's just not evenly distributed by Alex Pasternak
-
We're building a dystopia just to make people click on ads by Zeynep Tufeci
-
Privacy at the Margins a special section of the International Journal of Communication edited by Alice E. Marwick and danah boyd (who both also lead/advise at Data Society).
-
Your Browsing History Alone Can Give Away Your Identity by Kaveh Waddell covering this paper by Sharad Goel, Arvind Narayanan, Jessica Su, Ansh Shukla.
-
Canada's Quiet History Of Weakening Communications Encryption by Chris Parsons and Tamir Israel
-
Data Violence and How Bad Engineering Choices Can Damage Society by Anna Lauren Hoffman
-
Machine Bias by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, ProPublica. Accompanying Jupyter Notebook is here and a related python package, FairML, used to audit ML for bias is here.
-
Preventing Big Data Discrimination in Canada: Addressing Design, Consent, and Sovereignty Challenges by Jonathan Obar (UoT) and Brenda McPhail (CCLA)
-
The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition by Os Keyes
-
After a Year of Tech Scandals, Our 10 Recommendations for AI by AI Now Institute co-founded by Kate Crawford and Meredith Whittaker.
-
What Happens When An Algorithm Cuts Your Health Care by Colin Lecher
-
Bots at the Gate: A Human Rights Analysis of Automated Decision Making in Canada's Immigration and Refugee System by Miles Kenyon, Citizen Lab
-
A Framework for Understanding Unintended Consequences of Machine Learning by Harini Suresh and John V. Guttag
-
Moritz Hardt's CS 294:Fairness in Machine Learning Syllabus from UC Berkley
-
ACM Conference on Fairness, Accountability, and Transparency in ML and beyond
-
Fairness, Accountability, and Transparency in ML Annual Conference