Three reasons not to use drop='first' with OneHotEncoder
16
Use cross_val_score and GridSearchCV on a Pipeline
17
Try RandomizedSearchCV if GridSearchCV is taking too long
18
Display GridSearchCV or RandomizedSearchCV results in a DataFrame
19
Important tuning parameters for LogisticRegression
20
Plot a confusion matrix
21
Compare multiple ROC curves in a single plot
22
Use the correct methods for each type of Pipeline
23
Display the intercept and coefficients for a linear model
24
Visualize a decision tree two different ways
25
Prune a decision tree to avoid overfitting
26
Use stratified sampling with train_test_split
27
Two ways to impute missing values for a categorical feature
28
Save a model or Pipeline using joblib
29
Vectorize two text columns in a ColumnTransformer
30
Four ways to examine the steps of a Pipeline
31
Shuffle your dataset when using cross_val_score
32
Use AUC to evaluate multiclass problems
33
Use FunctionTransformer to convert functions into transformers
34
Add feature selection to a Pipeline
35
Don't use .values when passing a pandas object to scikit-learn
You can interact with all of these notebooks online using Binder:
Note: Some of the tips do not include any code, and can only be viewed on LinkedIn.
Who creates these tips?
Hi! I'm Kevin Markham, the founder of Data School. I've been teaching data science in Python since 2014. I create these tips because I love using scikit-learn and I want to help others use it more effectively.
Due to changes in the scikit-learn API, a small percentage of the code shown in the videos is out-of-date. However, the code in the Jupyter notebooks is all up-to-date.
How can I get better at scikit-learn?
Take my online course, Machine Learning with Text in Python. It includes 14 hours of video lessons, detailed lesson notebooks, homework assignments with included solutions, access to a Slack team, and more. Here's the detailed list of topics that I cover in the course.
The course is not free, but you can preview a small portion of the course by watching my PyCon 2016 tutorial.