/ml-system-learning-resources

Curated list of resources about Machine Learning Systems

Apache License 2.0Apache-2.0

ML System Learning Resources

This is a curated list of resources about Machine Learning Systems. Please feel free to contribute any items that should be included.

Content

Machine Learning handbook links

Problem Definition

During the Modeling phase, the primary objective is to create a Machine Learning model tailored to a specific business case. This process entails deriving a well-defined modeling strategy. While specific parameters like where the inference pipeline comes into play are usually given, critical elements like defining the target variables and observation space require meticulous design.

Observational Space & Target Definition

Offline Model Evaluation

  • Research Papers
    • The Relationship Between Precision-Recall and ROC Curves by Jesse Davis et al. The paper delves into the relationship between ROC and PR curves in machine learning, highlighting their deep connection, differences, and the implications for algorithm design, while introducing an efficient method to compute the achievable PR curve.

Monitoring

Data Distribution shifts

  • Books
  • Research Papers
    • An introduction to domain adaptation and transfer learning by Wouter M. Kouw et al. In this paper, the significance of unbiased training samples in machine learning for accurate predictions is emphasized. The challenges arising from differences in training and test data distributions are highlighted, and the concepts of domain adaptation and transfer learning as solutions are introduced. The paper explores the conditions under which classifiers can effectively generalize across different domains, discusses risk minimization, examines types of data set shifts, and presents various strategies to handle complex domain shifts.
    • Covariate Shift by Kernel Mean Matching by Arthur Gretton et al. In this paper, they introduce a method to adjust training data so its distribution aligns more closely with test data by matching covariate distributions in a high-dimensional feature space. This technique bypasses the need for distribution estimation, utilizing a straightforward quadratic programming process to determine sample weights.
    • Rethinking Importance Weighting for Deep Learning under Distribution Shift by Tongtong Fang et al. In this paper, the authors address the challenges of using importance weighting (IW) under distribution shift in complex data, particularly its incompatibility with deep learning. The circular dependency between weight estimation (WE) and weighted classification (WC) is highlighted. To resolve these challenges, the paper introduces "dynamic IW," an end-to-end solution that iteratively combines WE and WC.
    • Learning under Concept Drift: an Overview by Indrė Žliobaitė. This paper provides a comprehensive survey of the concept drift area. The paper not only offers a taxonomy of concept drift types but also discusses methods to handle them.

Machine Learning Systems Design

Machine Learning System Use Cases

  • Research Papers
  • Online Resources
    • Artwork Personalization at Netflix by Ashok Chandrashekar. Netflix leverages machine learning and contextual bandits to personalize artwork for titles, enhancing user engagement and offering a tailored viewing experience by understanding individual preferences and viewing histories.
    • Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies by Yaochu Jin et al. Research paper on applying Pareto optimization for ML, in which the authors claimed that “machine learning is inherently a multiobjective task”.
    • On Learning Invariant Representations for Domain Adaptation by Han Zhao et al. In this paper, the authors challenge the prevailing belief in unsupervised domain adaptation that using deep neural nets to learn domain-invariant features ensures successful adaptation. Through a counterexample, they highlight issues related to conditional shift in class-conditional distributions. They introduce a generalization upper bound that accounts for this shift and provide an information-theoretic lower bound related to learning invariant representations. The findings highlight a key tradeoff in domain adaptation when label distributions vary between source and target domains.

Causal Inference

  • Books
    • Causal Inference in Python by Matheus Facure. This book explains the largely untapped potential of causal inference for estimating impacts and effects.
  • Research Papers
    • Democratizing online controlled experiments at Booking.com by Raphael Lopez Kaufman et al. In this paper they how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic.
    • A Randomized Assessment of Online Learning by y William T. Alpert. In this paper they present a comprehensive randomized study on the effectiveness of online learning compared to traditional classroom instruction. Utilizing robust statistical methodologies, the authors assess student outcomes, engagement levels, and retention rates between the two modes of instruction. Key findings suggest that while online platforms offer greater flexibility and can achieve similar academic results, certain nuances, like student-teacher interaction and peer collaboration, differ significantly from in-person settings. The research delves deep into the data, using advanced analytical techniques to control for confounding variables and biases, ensuring the results are both reliable and generalizable. This study provides valuable insights not just for educators, but also for data scientists interested in the complexities of educational data and the challenges of conducting randomized trials in real-world settings.
    • A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments by Ron Kohavi et al. This paper highlights the misleading concepts often promoted in the industry regarding A/B testing, debunks these misconceptions with statistical reasoning, and offers suggestions to platform designers to prevent such intuitive errors.
    • Making Sense of Sensitivity: Extending Omitted Variable Bias by Carlos Cinelli et al. In this paper, the autors extend the omitted variable bias framework with a suite of tools for sensitivity analysis in regression models that: (i) does not require assumptions on the functional form of the treatment assignment mechanism nor on the distribution of the unobserved confounders; (ii) naturally handles multiple confounders, possibly acting non-linearly; (iii) exploits expert knowledge to bound sensitivity parameters; and, (iv) can be easily computed using only standard regression results.
    • A Crash Course in Good and Bad Controls by Carlos Cinelli et al. In this paper, the authors address the recurrent issue encountered by students and professionals in statistics, econometrics, and empirical social sciences concerning "bad controls" in regression models. Bad controls refer to variables that, when added to a regression equation, might create unintended discrepancies between the regression coefficient and its anticipated effect. Historically, mainstream literature has lacked comprehensive guidance on distinguishing between "good controls" (or confounders) – variables that reduce biases when included – and "bad controls", potentially exacerbating biases. Although some resources touch upon this topic, they often address only specific facets, leaving a gap in holistic understanding. In contrast to the prevailing belief that more controls invariably enhance regression models, this paper aims to elucidate recent advancements in graphical models that help discern between good and bad controls. By utilizing illustrative examples, the paper offers a concise and visual guide for practitioners to navigate the challenges surrounding the inclusion of controls in regression models, ultimately aiming to aid in the causal interpretation of these models.
    • Difference-in-Differences with Variation in Treatment Timing by Andrew Goodman-Bacon. In this paper, the author investigates the intricacies of the two-way fixed effects difference-in-differences (TWFEDD) estimator, often used when treatment timings vary across units. The paper reveals that TWFEDD is a weighted average of all two-period/two-group difference-in-differences estimators, where weights are influenced by the timing group sizes and treatment variance. The analysis highlights that TWFEDD can yield a variance-weighted average of treatment effects if these effects are consistent over time; however, varying effects result in "negative weights" and could lead to misleading estimates. The author introduces a new perspective on the common trends assumption, tailored for TWFEDD, and provides tools for dissecting and understanding changes in estimates across different specifications. The methods are applied to a case study on unilateral divorce laws' impact on female suicide rates, indicating potential biases in TWFEDD estimates due to treatment effect evolution over time. The findings emphasize caution in using TWFEDD with varied treatment timings and point towards more flexible estimators that can better handle such variations.
    • Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data by Alex Deng et al. In this paper, the authors introduced a technique called Controlled-experiment Using Pre-Experiment Data (CUPED), a method used to improve the efficiency of online controlled experiments. It is handy in the context of causal inference, as it aims to reduce variance in the experiment outcome metrics without introducing bias, thereby enhancing the experiment's sensitivity and potentially reducing the duration or sample size required.
    • Estimating Treatment Effects with Causal Forests: An Application by Susan Athey et al. In this paper, the authors, develop and apply causal forests to a dataset from the National Study of Learning Mindsets. They address the challenges of using causal forests, an adaptation of the random forest algorithm by Breiman (2001), for estimating heterogeneous treatment effects. The paper focuses on the use of estimated propensity scores by causal forests to enhance robustness against confounding factors and to handle data with clustered errors. This non-parametric approach extends the application of the random forest algorithm to the realm of causal inference, dealing with issues such as selection bias and clustered observations.
    • Why Propensity Scores Should Not Be Used for Matching by Gary King and Richard Nielsen. In this paper, the authors show that propensity score matching often accomplishes the opposite of its intended goal --- thus increasing imbalance, inefficiency, model dependence, and bias. The weakness of PSM comes from its attempts to approximate a completely randomized experiment, rather than, as with other matching methods, a more efficient fully blocked randomized experiment.
    • Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data by Joseph D. Y. Kang et al. In this article, the authors examine doubly robust (DR) procedures, which are designed to address selection bias in data with uncontrolled nonresponse and attrition. DR methods rely on two models: one for the response population and another for data selection. The key feature of DR estimates is their consistency even if one of the models is misspecified. The authors review different incomplete-data estimation strategies, linking DR methods to survey and causal inference techniques. They also assess the effectiveness of DR estimators in simulated scenarios where both models are moderately incorrect. Their findings indicate that while DR methods can outperform simple inverse-probability weighting, they do not always surpass simple regression-based prediction of missing values. This underscores the limitations of DR methods in certain contexts.
    • Causal Inference and Uplift Modeling A review of the literature by Pierre Gutierrez et al. In this article, the authors present a comprehensive review of uplift modeling within the context of causal inference and machine learning. They focus on three main approaches to uplift modeling: the Two-Model approach, the Class Transformation approach, and direct modeling of uplift. The authors also discuss the challenges in evaluating uplift models due to the unobservable nature of simultaneous control and treatment outcomes, proposing a transformed outcome variable for model evaluation.
  • Online Resources

Machine Learning in Organizations

  • Books
    • Reliable Machine Learning by Cathy Chen et al. This practical book shows data scientists, software and site reliability engineers, product managers, and business owners how to run and establish ML reliably, effectively, and accountably within your organization.
  • Online Resources