my-reading-list: A repository from bjamil

My Reading List

Publications, books, and web pages I've been reading or am planning on reading.

Why

I've been trying to level up recently on ML, LLMs, NLU, etc. and whenever I read a paper, I feel there are ten others I should read as well :) . This repo is to better track what I've read and what I want to read and jot some learnings along the way.

I also want to give this Learning in Public thing a shot. Let's see how it goes!

ML Reading List

General

Paper	Read Date	Last Revise Date	Notes
Evaluating Large Language Models Trained on Code	2023-03-12
Understanding HTML with Large Language Models	2023-03-12	2022-10-08	Notes
Multi-Task Sequence to Sequence Learning		2016-03-01
Emergent Abilities of Large Language Models		2022-10-06
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model		2022-12-11
Finetuned Language Models Are Zero-Shot Learners		2022-02-08
LLaMA: Open and Efficient Foundation Language Models		2023-03-27
Training language models to follow instructions with human feedback		2022-03-04
HTLM: Hyper-Text Pre-Training and Prompting of Language Models		2021-07-14
Environment Generation for Zero-Shot Compositional Reinforcement Learning		2022-01-21
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
LaMDA: Language Models for Dialog Applications

Training Speedups/Scaling

Paper	Read Date	Last Revise Date	Notes
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
PaLM: Scaling Language Modeling with Pathways
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Training Compute-Optimal Large Language Models		2022-03-29
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts		2021-12-13

Non-LLMs

Paper	Read Date	Last Revise Date	Notes
World of Bits: An Open-Domain Platform for Web-Based Agents		2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
User-Driven Automation of Web Form Filling		2013
Learning Transferable Visual Models from Natural Language Supervision		2021-02-26
Learning to Generate Reviews and Discovering Sentiment		2017-04-06
WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset		2021-07-20
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Extracting Structured Data from Templatic Documents		2020-06-12

Bloom Filters

Read Date	Resource	Notes
2024-01-30	Bloom Filters by ByteByteGo	Gives a decent intuition
2024-01-30	What are Bloom Filters?	Not the best example. prev vid was better
2024-01-30	Advancing Spark - Bloom Filter Indexes in Databricks Delta	Interesting, but more about delta than spark, as the title implies
	The Case for Learned Index Structures
	Optimizing Learned Bloom Filters by Sandwiching

Quantization, Model Compression & Optimization

Read Date	Resource	Notes
	Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
2024-01-30	How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs (Roblox)	Interesting. Always nice to read actual case studies. I'd like to see how ONNX compares to their benchmarks.

Blog Posts I Liked

Read Date	Post	Notes
2024-01-30	How we reduced our text similarity runtime by 99.96% (Microsoft)	I skimmed through it. Seems interesting and worth a reread
2024-01-30	How Roblox Reduces Spark Join Query Costs With Machine Learning Optimized Bloom Filters	I wonder if this can be applied to other use cases too and not just fact/dim tables. Interesting read.

Blog Posts to Read

Post	Notes
Using machine learning to index text from billions of images (Dropbox)	Curious abouth the OCR/PDF text extraction part here. Need some caffiene in me to read this.

bjamil/my-reading-list