readings

List of (mostly) data-related books & papers/articles that I've read.

The pdfs that I uploaded here are all annotated - however these are my personal annotations so they might be unreadable. I'll always link to the original post/paper in case you want to check it out yourself (you should!).

books

Artificial Intelligence: A Guide for Thinking Humans: how did AI come about in the first place? Where are we now with AI? How far away are we from AGI? I read this in the middle of the GPT-3 hype & it was truly a sobering read.
Build a Career in Data Science: recommended read if you: are looking to break into the data science field or are already in the field but wondering what the possible career path(s) are like.
Feature Engineering and Selection: A Practical Approach for Predictive Models: pretty much covers most of the major feature engineering and selection methods that you need to know in one place. I often find learning resources related to these topics to be scattered all over the internet so it's helpful to have them in one place. Each methodology is also supported with its strength and drawback, as well as a case study (with bonus R code!) to illustrate how it works.
Statistics Done Wrong: The Woefully Complete Guide: a book of common statistics gotchas.
The Effective Engineer: not about data, so not 100% of the examples are probably relatable, but this book really helps me to think about how I can be an effective data scientist.
Thinking with Data: This book contains everything you need to know about working on data projects that does not have anything to do with algorithms etc. If you're not yet comfortable with topics like scoping, project management, dealing with stakeholders---this book is for you. I think anyone who wants to be a good data practitioner beyond fitting models should definitely read this book cover-to-cover. You can also read my reading notes here.

articles/papers

general

150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (annotated pdf) (paper): Lessons learned from ML models deployed at Booking.com. Some of the most memorable examples (to me) are around monitoring & experiment design. The examples are mostly skewed towards features you can find in Booking.com e.g. hotel/destination recommendations.
CS 329S: Mahcine Learning Systems Design (annotated pdf) (paper): Lecture notes on machine learning systems design. So far it has covered various data formats, data engineering 101, and building training data.
Challenges in Deploying Machine Learning: a Survey of Case Studies (annotated pdf) (paper): This paper covers a lot of aspects related to deploying machine learning; a must-read for those who wish to learn more about what it takes to deploy ML in the real-world (besides the training part!). Some of my favorite takeaways: 1) continuous delivery for ML models is complicated because for ML, there are 3 moving parts: code, data, model (as opposed to just the code). 2) When it comes to end users' trust, there are limits to ML explainability. Most of the time more efforts are required, e.g. engaging with domain experts from very early on, establishing mechanisms for accountability. 3) Data Oriented Approach (DOA) aims to replace microservice architecture with streaming-based architecture. Personally, I've about this idea in passing but never really read up on it.
Classifier calibration (annotated pdf) (article): Classifier calibration 101. What is it, why do we calibrate models, why & when to use it.
On Being a Data Skeptic (annotated pdf) (article): a timeless read. This is from 2013 but many, if not all points, are still relevant today. A mandatory reading for all data practicioners IMO.
From shallow to deep learning in fraud (annotated pdf) (article): How & why Lyft moved from logistic regression, gradient-boosted decision trees (GBDT) ensemble models, to deep learning for their fraud usecase.
Software 2.0 (annotated pdf) (article): This essay explores what Karpathy names "Software 2.0"; in contrast to Software 1.0, where humans write explicit instructions to the computer, in Software 2.0 we "specify some goal on the behavior of a desirable program". By this definition, things like neural networks are considered as Software 2.0 & not just a tool. This essay also explores the limitations of Software 2.0, including interpretability, bias, & adversarial attack.
The End of Moore's Law, CPUs (as we know them), and the Rise of Domain Specific Architectures (annotated pdf) (slides): I read this to better understand the paper The Hardware Lottery. This slide deck explains why we're moving towards Domain Specific Architectures---the downfall of Moore's Law being one of the reasons.
The Hardware Lottery (annotated pdf) (paper): this paper discusses that throughout history, hardware & software have frequently determined which research ideas succeed. This is even more worrisome when we take into account the fact that we're moving towards Domain Specific Architectures (DSA); the implication would be that some research directions (the ones with DSAs available/DSAs that are commercially viable) will prevail while other research directions that have potential (& can even possibly be better given the chance), yet are not supported by existing DSAs might be left behind.
What’s Keeping Women Out of Data Science? (annotated pdf) (article): findings include a) women seek for clear information of what the day-to-day job is really like; b) women also seek for more opportunities with actual practicioners, because they're looking for a direct feel of the ways of working, degree of collaboration, personalities which cannot be conveyed through corporate slideshows. I think this further emphasizes why representation is very important.

automl

A critical overview of AutoML solutions (annotated pdf) (article): a quick overview of various AutoML solutions, from Google AutoML, auto-sklearn, Amazon Sagemaker, etc.
Leveraging Automated Machine Learning for Text Classification: Evaluation of AutoML Tools and Comparison with Human Performance (annotated pdf) (article): comparison of various AutoML tools on 13 different popular datasets for text classification tasks. AutoGluon Text and auto-sklearn outperforms TPOT and H2O most of the time.

metrics

Categorizing Variants of Goodhart's Law (annotated pdf) (paper): one of the papers/work I can find that touch on the relation between Goodhart's Law & ML/AI. The paper explores 4 types of Goodhart's Law: Regressional Goodhart, Extremal Goodhart, Causal Goodhart, & Adversarial Goodhart.
Designing and evaluating metrics (annotated pdf) (article): this article explores the five properties of metrics to keep in mind when designing a metric (Cost, Simplicity, Faithfulness, Precision, & Causal Proximity) & lifecycle of a metric.
How do you set metrics? (annotated pdf) (article): pretty self-explanatory. One of my key takeaways is: "start with a plain-language statement about what a successful outcome would look like in human terms" & then go from there. Other takeaways include: track counter-metrics, avoid having too many metrics, understand the dynamic between metrics, avoid vanity metrics.
Metrics Versus Experience (annotated pdf) (article): some points overlap with the ones discussed in How do you set metrics?. Main takeaway is it's not metrics vs experience. Metrics are helpful, the question is how do we design good metrics?

NLP

Abusive Language Detection in Online User Content (annotated pdf) (paper): abusive language classification problem. Best result uses character n-grams (also beats deep learning approach.)
Deep Learning Based Text Classification: A Comprehensive Review (annotated pdf) (paper): A survey of more than 150 deep learning models & +40 text classification datasets. There's also a quantitative analysis of the performance of these models on several datasets.
The Unreasonable Effectiveness of Recurrent Neural Networks (annotated pdf) (article): Primer to RNNs & how they work. Recommended reading if you haven't learned RNN before but want to delve into recent NLP surveys; most will definitely talk about RNN & it's helpful to know how RNNs work & their shortcomings.
Multilingual Multi-class Sentiment Classification Using Convolutional Neural Networks (annotated pdf) (paper): language-independent deep learning model for multi-class sentiment analysis. Evaluated on English, German, & Arabic datasets. Uses oversampling to handle the class imbalance.
NLP for supervised learning (annotated pdf) (article): Recommended if you need to catch up with all of the progress going on in (supervised learning) NLP so far.
The Illustrated Transformer (annotated pdf) (article): Have trouble understanding how transformers work? This article (& especially its illustrations) will make it easier for you to understand how it works.
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) (annotated pdf) (article): I found Attention quite difficult to visualize when I'm just reading words on a paper. Recommended if you're a visual learner & want to delve deep into how attentions work.

videos

I've been watching a lot of videos instead of articles/papers lately, thought I'd list them here as well.

Anomaly Detection: Algorithms, Explanations, Applications (video): Lecture on anomaly detection, specifically using outlier detection approach which is suggested when the number of anomaly points only makes up a small % (< 5%). Highlights for me: 1) based on their benchmarking, isolation forest is pretty robust; 2) anomaly explanation with Sequential Feature Explanations; 3) Application on open category detection.
Train, Evaluate, Repeat: Building a Credit Card Fraud Detection System (video): Challenges that Stripe faced when building a fraud detection system & how they overcame them. Three challenges highlighted in the talk: a) defining the label, b) handling class imbalance (fraudulent transactions only make up <1 % of total transactions) by fixing the target rate for the positive class (as opposed to determining a fixed subsampling rate for the negative class), c) model evaluation. Interestingly, the last part of the talk, on model evaluation, is an update from the talk Counterfactual evaluation of machine learning models that I watched a while ago. It highlighted some issues with the initial attempt on counterfactual evaluation, namely a) how a misclassified single high-weight transaction can really shift the estimates significantly, b) the initial propensity function assumes that transactions are independent, & they cannot guarantee that the events are indeed independent (e.g. repeated attempts of the same transaction).

galuhsahid/readings