Curated papers, articles, and blogs on data science & machine learning in production. âď¸
Figuring out how to implement your ML project? Learn from how other organizations have done it:
- How the problem is framed đ(e.g., personalization as recsys vs. search vs. sequences)
- What machine learning techniques worked â (and sometimes, what didn't â)
- Why it works, the science behind it with research, literature, and references đ
- What real-world results were achieved (so you can better assess ROI â°đ°đ)
Table of Contents
- Data Quality
- Data Engineering
- Data Discovery
- Classification
- Regression
- Recommendation
- Search & Ranking
- Natural Language Processing
- Sequence Modelling
- Forecasting
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Validation and A/B Testing
- Practices
- Failures
- Monitoring Data Quality at Scale with Statistical Modeling
Uber
- An Approach to Data Quality for Netflix Personalization Systems
Netflix
- Automating Large-Scale Data Quality Verification (Paper)
Amazon
- Meet Hodor â Gojekâs Upstream Data Quality Tool
Gojek
- Reliable and Scalable Data Ingestion at Airbnb
Airbnb
- Data Management Challenges in Production Machine Learning (Paper)
Google
- Zipline: Airbnbâs Machine Learning Data Management Platform
Airbnb
- Sputnik: Airbnbâs Apache Spark Framework for Data Engineering
Airbnb
- Introducing Feast: an open source feature store for machine learning (Code)
Gojek
- Feast: Bridging ML Models and Data
Gojek
- Amundsen â Lyftâs Data Discovery & Metadata Engine
Lyft
- Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code)
Lyft
- Democratizing Data at Airbnb
Airbnb
- Databook: Turning Big Data into Knowledge with Metadata at Uber
Uber
- Metacat: Making Big Data Discoverable and Meaningful at Netflix
Netflix
- DataHub: A Generalized Metadata Search & Discovery Tool
LinkedIn
- How We Improved Data Discovery for Data Scientists at Spotify
Spotify
- High-Precision Phrase-Based Document Classification on a Modern Scale (Paper)
LinkedIn
- Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper)
WalmartLabs
- Large-scale Item Categorization for e-Commerce (Paper)
DianPing
,eBay
- Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper)
NAVER
- Categorizing Products at Scale
Shopify
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google
- Discovering and Classifying In-app Message Intent at Airbnb
Airbnb
- How We Built the Good First Issues Feature
GitHub
- Teaching Machines to Triage Firefox Bugs
Mozilla
- Testing Firefox More Efficiently with Machine Learning
Mozilla
- Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper)
Microsoft
- Prediction of Advertiser Churn for Google AdWords (Paper)
Google
- Using Machine Learning to Predict Value of Homes On Airbnb
Airbnb
- Using Machine Learning to Predict the Value of Ad Requests
Twitter
- Open-Sourcing Riskquant, a Library for Quantifying Risk (Code)
NetFlix
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper)
Amazon
- Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models (Paper)
Alibaba
- Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper)
Alibaba
- Session-based Recommendations with Recurrent Neural Networks (Paper)
Telefonica
- How 20th Century Fox uses ML to predict a movie audience (Paper)
20th Century Fox
- Deep Neural Networks for YouTube Recommendations
YouTube
- Personalized Recommendations for Experiences Using Deep Learning
TripAdvisor
- E-commerce in Your Inbox: Product Recommendations at Scale
Yahoo
- Product Recommendations at Scale (Paper)
Yahoo
- Powered by AI: Instagramâs Explore recommender system
Facebook
- Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2)
Netflix
- Learning a Personalized Homepage
Netflix
- Artwork Personalization at Netflix
Netflix
- To Be Continued: Helping you find shows to continue watching on Netflix
Netflix
- Calibrated Recommendations (Paper)
Netflix
- Food Discovery with Uber Eats: Recommending for the Marketplace
Uber
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
- How Music Recommendation Works â And Doesnât Work
Spotify
- Music recommendation at Spotify
Spotify
- Recommending Music on Spotify with Deep Learning
Spotify
- For Your Ears Only: Personalizing Spotify Home with Machine Learning
Spotify
- Reach for the Top: How Spotify Built Shortcuts in Just Six Months
Spotify
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper)
Spotify
- The Evolution of Kit: Automating Marketing Using Machine Learning
Shopify
- Using Machine Learning to Predict what File you Need Next (Part 1)
Dropbox
- Using Machine Learning to Predict what File you Need Next (Part 2)
Dropbox
- Personalized Recommendations in LinkedIn Learning
LinkedIn
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)
LinkedIn
- A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)
LinkedIn
- Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)
LinkedIn
- How TikTok recommends videos #ForYou
ByteDance
- Amazon Search: The Joy of Ranking Products (Paper)
Amazon
- Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper)
Amazon
- How Lazada Ranks Products to Improve Customer Experience and Conversion
Lazada
- Using Deep Learning at Scale in Twitterâs Timelines
Twitter
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Airbnb
- Applying Deep Learning To Airbnb Search (Paper)
Airbnb
- Managing Diversity in Airbnb Search (Paper)
Airbnb
- Ranking Relevance in Yahoo Search (Paper)
Yahoo
- An Ensemble-Based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper)
Etsy
- Learning to Rank Personalized Search Results in Professional Networks (Paper)
LinkedIn
- Entity Personalized Talent Search Models with Tree Interaction Features (Paper)
LinkedIn
- In-Session Personalization for Talent Search (Paper)
LinkedIn
- The AI Behind LinkedIn Recruiter search and recommendation systems
LinkedIn
- AI at Scale in Bing
Microsoft
- Query Understanding Engine in Traveloka Universal Search
Traveloka
- The Secret Sauce Behind Search Personalisation
GoJek
- Food Discovery with Uber Eats: Building a Query Understanding Engine
Uber
- Neural Code Search: ML-based code search using natural language queries
Facebook
- Bayesian Product Ranking at Wayfair
Wayfair
- Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper)
Alibaba
- Embeddings@Twitter
Twitter
- Listing Embeddings in Search Ranking (Paper)
Airbnb
- Understanding Latent Style
Stitch Fix
- Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper)
LinkedIn
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper)
Sears
- Machine Learning for a Better Developer Experience
Netflix
- Abusive Language Detection in Online User Content (Paper)
Yahoo
- How Natural Language Processing Helps LinkedIn Members Get Support Easily
LinkedIn
- Building Smart Replies for Member Messages
LinkedIn
- Smart Reply: Automated Response Suggestion for Email (Paper)
Google
- SmartReply for YouTube Creators
Google
- Using Neural Networks to Find Answers in Tables (Paper)
Google
- A Scalable Approach to Reducing Gender Bias in Google Translate
Google
- Assistive AI Makes Replying Easier
Microsoft
- AI Advances to Better Detect Hate Speech
Facebook
- A State-of-the-Art Open Source Chatbot (Paper)
Facebook
- A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs
Facebook
- Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper)
Amazon
- How Gojek Uses NLP to Name Pickup Locations at Scale
GoJek
- Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want
Stitch Fix
- The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper)
Baidu
- Deep Learning to Translate Between Programming Languages (Paper)
Facebook
- PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper) (Code)
Google
- Recommending Complementary Products in E-Commerce Push Notifications with Mixture Models (Paper)
Alibaba
- Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba
- Search-based User Interest Modeling with Lifelong Sequential Behavior Data for CTR Prediction (Paper)
Alibaba
- Learning to Diagnose with LSTM Recurrent Neural Networks (Paper)
Google
- Deep Learning for Understanding Consumer Histories (Paper)
Zalando
- Continual Prediction of Notification Attendance with Classical and Deep Network Approaches (Paper)
Telefonica
- Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper)
Sutter Health
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper)
Sutter Health
- Forecasting at Uber: An Introduction
Uber
- Engineering Extreme Event Forecasting at Uber with RNN
Uber
- Transforming Financial Forecasting with Data Science and Machine Learning at Uber
Uber
- Under the Hood of Gojekâs Automated Forecasting Tool
GoJek
- Categorizing Listing Photos at Airbnb
Airbnb
- Amenity Detection and Beyond â New Frontiers of Computer Vision at Airbnb
Airbnb
- Powered by AI: Advancing product understanding and building new shopping experiences
Facebook
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning
Dropbox
- How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors
Deepomatic
- A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper)
Google
- Machine Learning-based Damage Assessment for Disaster Relief (Paper)
Google
- RepNet: Counting Repetitions in Videos (Paper)
Google
- Converting Text to Images for Product Discovery (Paper)
Amazon
- How Disney uses PyTorch for Animated Character Recognition
Disney
- Image Captioning as an Assistive Technology (Video)
IBM
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper)
Alibaba
- Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper)
Alibaba
- Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper)
Alibaba
- Productionizing Deep Reinforcement Learning with Spark and MLflow
Zynga
- Deep Reinforcement Learning in Production
Zynga
- Building AI Trading Systems
Denny Britz
- Detecting Performance Anomalies in External Firmware Deployments
Netflix
- Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code)
LinkedIn
- Uncovering Insurance Fraud Conspiracy with Network Learning (Paper)
Ant Financial
- How does Spam Protection Work on Stack Exchange?
Stack Exchange
- Auto Content Moderation in C2C e-Commerce
Mercari
- Building The LinkedIn Knowledge Graph
LinkedIn
- Retail Graph â Walmartâs Product Knowledge Graph
Walmart
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Uber
- AliGraph: A Comprehensive Graph Neural Network Platform (Paper)
Alibaba
- Scaling Knowledge Access and Retrieval at Airbnb
Airbnb
- How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats
Uber
- Next-Generation Optimization for Dasher Dispatch at DoorDash
DoorDash
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)
Lyft
- The Data and Science behind GrabShare Carpooling (PAPER NEEDED)
Grab
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper)
Rakuten
- Information Extraction from Receipts with Graph Convolutional Networks
Nanonets
- Using Machine Learning to Index Text from Billions of Images
Dropbox
- Extracting Structured Data from Templatic Documents (Paper)
Google
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper)
Google
- Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper)
Intel
- Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper)
Apple
- Bootstrapping Conversational Agents with Weak Supervision (Paper)
IBM
- Better Language Models and Their Implications (Paper)
OpenAI
- Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post)
OpenAI
- Image GPT (Paper) (Code)
OpenAI
- Deep Learned Super Resolution for Feature Film Production (Paper)
Pixar
- The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper)
Google
- Detecting Interference: An A/B Test of A/B Tests
LinkedIn
- Building Inclusive Products Through A/B Testing (Paper)
LinkedIn
- Experimenting to Solve Cramming
Twitter
- Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper)
Uber
- Enabling 10x More Experiments with Traveloka Experiment Platform
Traveloka
- Large scale experimentation at StitchFix (Paper)
Stitch Fix
- Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code)
Better
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Yoshua Bengio
- Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper)
Google
- Rules of Machine Learning: Best Practices for ML Engineering
Google
- On Challenges in Machine Learning Model Management
Amazon
- Machine Learning in production: the Booking.com approach
Booking
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Booking
- Engineers Shouldnât Write ETL: A Guide to Building a High Functioning Data Science Department
Stitch Fix
- Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist
Stitch Fix
- 160k+ High School Students Will Graduate Only If a Model Allows Them to
International Baccalaureate
- When It Comes to Gorillas, Google Photos Remains Blind
Google
- An Algorithm That âPredictsâ Criminality Based on a Face Sparks a Furor
Harrisburg University
Some people collect stamps. I collect these. đ