This repository is a challenge for the Junior Data Engineer position. For that, crossings, merges and transformations will be carried out in the data, in order to answer some questions and extract some insights from the book depository dataset.
Data source: https://www.kaggle.com/sp1thas/book-depository-dataset
- What is the total amount of books in the base?
- How many books have only 1 author?
- Which are the 5 authors with the most books?
- How many books per category?
- What are the 5 categories with the most books?
- Which format has the most books?
- Considering the
bestsellers-rank
column, what are the 10 best ranked books? - Considering the
rating-avg
column, what are the 10 best ranked books? - How many books have
rating-avg
greater than 3.5? - How many books have a publication date (
publication-date
) greater than01-01-2020
?