/book-depository

📚 Performing data crossing, merging and transformation, in order to answer some questions about the book depository dataset

Primary LanguageJupyter NotebookMIT LicenseMIT

Book Depository

Open in colab GitHub license

This repository is a challenge for the Junior Data Engineer position. For that, crossings, merges and transformations will be carried out in the data, in order to answer some questions and extract some insights from the book depository dataset.

Data source: https://www.kaggle.com/sp1thas/book-depository-dataset

Questions to be answered

  • What is the total amount of books in the base?
  • How many books have only 1 author?
  • Which are the 5 authors with the most books?
  • How many books per category?
  • What are the 5 categories with the most books?
  • Which format has the most books?
  • Considering the bestsellers-rank column, what are the 10 best ranked books?
  • Considering the rating-avg column, what are the 10 best ranked books?
  • How many books have rating-avg greater than 3.5?
  • How many books have a publication date (publication-date) greater than 01-01-2020?