This repo consists of the following items:
- A dataset called training_dataset.h5 located in the data folder
- A data dictionary with a description of the various features and acceptable values located in the reference folder
- A models folder where you will upload your pickled model for assessment
InstaFace (IF) is a cutting edge startup specializing in facial recognition. As a hot tech startup, IF is constantly on the lookout for identifying and hiring the best talent. Because they are the best at what they do, their applicant pool is massive and growing. In fact, the number of applicants has grown so large and so fast that Human Resources just can't keep up, so they need your help to create an automated way to identify the most promising candidates. In particular, they asked that you create a model that can take a number of predefined inputs and output a probability that a particular candidate will be hired. The good news is IF has hired scores of data scientists, so the dataset is relatively rich. One thing to note is that IF has automated some of their information collecting processes but also relies on human data entry for the remainder. The latter has been a source of error in the past.
Your goal is to fork this repo, investigate the provided dataset, build a series of models, and upload a pickelized version of your best model for assessment. Specifically, your model will be assessed on a test set that has been withheld. You will be assessed on your log loss score. At the end you will receive both your log loss score and a description stating how well you did (e.g. needs improvement, satisfactory, proficient).
Python 3 is the wave of the future. All of your work must be completed in Python 3. If you code in Python 2 and submit your pickled model in Python 2, you're going to have a bad time.