Machine Learning Problem Bible (MLPB)

MLPB is meant to become an organized collection of machine learning problems and solutions. In practice, machine learning often goes like this

I have this problem... I need to classify something as A, B or C using a combination of numeric and categorical features. If I could find a similar problem, maybe I could modify the solution to work for my needs.

This is where MLPB steps in. Want to see ML problems with sparse data? Got it. Want to compare Scikit-learn’s RandomForestRegressor with R’s randomForest? No problem. Need an example of predicting a ranked target variable? You’ve come to the right place.

How It Works

MLPB contains a directory of Projects. Within each project is a designated Data directory and one or more machine learning problem directories. This looks something like

Projects/

  Iris/
    Data/
      iris.csv
    PredictSpecies/
      predict_species_rf.R
      predict_species_rf.py
      predict_species_xgb.R
    PredictSepalLength/
      iris_sepal_length_xgb.R
      
  JobTitles/
    Data/
      jobtitles.csv
    PredictJobCategory/
      naive_bayes_model.py
      naive_bayes_model.R

Most of these directories should include a README.md file providing details about the data and/or problem. A single dataset can have have multiple problems, and a single problem can have multiple example models/solutions.

You can browse all the datasets and problems in MLPB's wiki. There, you can also search for problems with specific tags like [mult-class classification], [sparse-data], [NLP], etc.

desiguel/MLPB

Machine Learning Problem Bible (MLPB)

How It Works