🤖⚡ Daily scikit-learn tips

New tips are posted on LinkedIn, Twitter, and Facebook every weekday!

👉 Sign up to receive 5 tips by email every week 👈

List of all tips

Click to view the Jupyter notebook for a tip, or click to discuss the tip on LinkedIn:

#	Description	Links
1	Use `ColumnTransformer` to apply different preprocessing to different columns
2	Seven ways to select columns using `ColumnTransformer`
3	What is the difference between "fit" and "transform"?
4	Use "fit_transform" on training data, but "transform" (only) on testing/new data
5	Four reasons to use scikit-learn (not pandas) for ML preprocessing
6	Encode categorical features using `OneHotEncoder` or `OrdinalEncoder`
7	Handle unknown categories with `OneHotEncoder` by encoding them as zeros
8	Use `Pipeline` to chain together multiple steps
9	Add a missing indicator to encode "missingness" as a feature
10	Set a "random_state" to make your code reproducible
11	Impute missing values using `KNNImputer` or `IterativeImputer`
12	What is the difference between `Pipeline` and `make_pipeline`?
13	Examine the intermediate steps in a `Pipeline`
14	`HistGradientBoostingClassifier` natively supports missing values
15	Three reasons not to use drop='first' with `OneHotEncoder`
16	Use `cross_val_score` and `GridSearchCV` on a `Pipeline`
17	Try `RandomizedSearchCV` if `GridSearchCV` is taking too long
18	Display `GridSearchCV` or `RandomizedSearchCV` results in a DataFrame
19	Important tuning parameters for `LogisticRegression`
20	Plot a confusion matrix
21	Compare multiple ROC curves in a single plot
22	Use the correct methods for each type of `Pipeline`
23	Display the intercept and coefficients for a linear model
24	Visualize a decision tree two different ways
25	Prune a decision tree to avoid overfitting
26	Use stratified sampling with `train_test_split`
27	Two ways to impute missing values for a categorical feature
28	Save a model or `Pipeline` using joblib

You can interact with all of these notebooks online using Binder:

Note: Some of the tips do not include any code, and can only be viewed on LinkedIn.

Who creates these tips?

Hi! I'm Kevin Markham, the founder of Data School. I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.

How can I learn scikit-learn from scratch?

Watch my free video series, Introduction to Machine Learning in Python with scikit-learn. There are 10 videos totaling 4.5 hours, and each video has a corresponding Jupyter notebook. Here's the detailed list of topics that I cover in the series.

Due to changes in the scikit-learn API, a small percentage of the code shown in the videos is out-of-date. However, the code in the Jupyter notebooks is all up-to-date.

How can I get better at scikit-learn?

Take my online course, Machine Learning with Text in Python. It includes 14 hours of video lessons, detailed lesson notebooks, homework assignments with included solutions, access to a Slack team, and more. Here's the detailed list of topics that I cover in the course.

The course is not free, but you can preview a small portion of the course by watching my PyCon 2016 tutorial.

Do you have any other tips?