Live Coding Assessment for Data Science Roles at Random Walk
-
Please ensure that you are taking the assessment alone in an empty room with good internet connectivity.
Internet Speedtest Link: https://www.speedtest.net/
-
Fork the github repo into your personal Github account and take a clone into your local system.
Guide to Forking Github Repo: https://docs.github.com/en/github-ae@latest/get-started/quickstart/fork-a-repo
Guide for cloning Github Repo: https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository
-
Intantiate a Jupyter Notebook instance in the local working directory and create a notebook which answers the following questions.
Guide for installing Jupyter Notebook in Local system: https://test-jupyter.readthedocs.io/en/latest/install.html
-
Save the notebook and push it into forked Github github repo.
Guide to pushing code into Github Repo: https://docs.github.com/en/migrations/importing-source-code/using-the-command-line-to-import-source-code/adding-locally-hosted-code-to-github
-
Share the repository link into the Google Form: https://forms.gle/xvvXnZdR8SQAGeQA9
Provided are 3 CSV files books.csv, book_tags.csv and ratings.csv for the assessment.
Candidate must create a jupyter notebook which processes the data provided in the CSV files to solve the following questions within the time limit provided.
-
How many books do not have an original title [books.csv]?
-
How many unique books are present in the dataset ? Evaluate based on the 'book_id' after removing records containing null values for original_title column in [books.csv] and corresponding records in [book_tags.csv] and [ratings.csv]
-
How many unique users are present in the dataset [ratings.csv] ?
-
How many unique tags are there in the dataset [book_tags.csv] ?
-
Which tag_id is the most frequently used ie. mapped with the highest number of books [book_tags.csv] ? (In case of more than one tag, mention the tag id with the least numerical value)’.
-
Which book (title) has the most number of counts of tags given by the user [book_tags.csv,books.csv] .
-
Plot a bar chart with top 20 unique tags in descending order of ‘user records’ (the number of users tagged the given tag_id with the goodreads_book_id) [book_tags.csv]