Take-Home

The take-home problem for candidates for pivotal life sciences (data scientist).

These exercises are not meant to take you long, the expected work time is < 4 hours. We are looking for a candidate who can demonstrate their ability to learn new technologies and solve problems. We are not looking for a candidate who can solve these problems in the shortest amount of time.

You are free to present the information in any way you desire, while jupyter notebooks are preferred, but not required (there is no penalty for using a different format, use whatever you think you will perform strongest with).

Submission Instructions

All answers should be in the form of a pull request from a fork of this repository (being able to use git basics is a requirement for this position). You will not be able to push/merge branches directly (you do not have permissions) so please fork this repository and submit a pull request from there (via "contribute").

Choose 1 exercise to complete from the following list:

There is NO preferred exercise!

Exercise 1: Chess data

The file chess_games.csv is a collection of chess games from Lichess and Chess.com along with a collection of metrics of each player's performance during the game (the actual game is not included).

The task is this: Determine the difference in player behavior when they’re winning vs losing vs maintaining their elo score

Be sure to look at the problem carefully, while we are looking at overall patterns amongst players, how you break up a player's performance into winning, losing, and maintaining is important.

If you have any questions please reach out to Travis at Travis.barton@pivotallifesciences.com.

Exercise 2: Financial data

This exercise is a BYOD (bring your own data) exercise. You are free to use any data you wish, but you must either include the data as a file in the repository or provide a link to the data in the README.md file. (No matter what you choose, you must explain why you chose this particular data set).

The task is this: which companies inside the XBI index were hit the hardest by the latest market downturn? Why?

This problem is equally graded on a candidate's ability to communicate their thought process as it is their ability to pull and extract meaningful data for a problem.

If you have any questions please reach out to Travis at Travis.barton@pivotallifesciences.com

Exercise 3: Reddit Data

The file askscience_data.csv is a collection of posts from the subreddit r/askscience. The task is comes with two parts:

Determine the attributes of a successful post on r/askscience
Build a model that can predict the score of a post on r/askscience given at least the title and body of the post (There is no need to limit it to just the title and body, but you must explain why you chose the features you did).