/Harry-Potter-Book-Analysis

Repo to explore and learn methods to analyse and visualise harry potter book series

Primary LanguagePythonMIT LicenseMIT

Harry Potter Book Analysis

JK Rowling wrote a series of 7 fantasy novels which took the world by storm 🌪️

I just finished the fifth book and just started 6th book I completed reading all 7 books 📚 and 8 movies 🎥

This repo's goal is to learn some common useful NLP (Natural Language Processing) methods, graph theory concepts and cool visualisations

Another side goal is to document the code as close to PEP 8

Doc string PEP 257


Word clouds

World cloud should reflect the name or central theme of each book

Combined gif

Philosopher's Stone (1997) | Chamber of Secrets (1998) | Prisoner of Azkaban (1999)

Goblet of Fire (2000) | Order of the Phoenix (2003) | Half-Blood Prince (2005) | Deathly Hallows (2007)

Glossary of 7 books

Name of each book and cover image made of word cloud

Cover

How big is Harry Potter verse

Number of chapters in each book and total words in each book

CoverImage

Code can be seen here and PDF image here

Note : Add one small image theming each book, book name, update font to Lumos or HP font

Also do annotations, smallest chapter, largest chapter, smallest and largest book

Spell analysis in HP series

This was motivated by the amazing visualisation in Tableau by Skyler Johnson

Data was exported from Tableau and cleaned to tidy format

Yet to incorporate the interactive bits!

CoverImagefromR

  • x-axis is location within the book (All books were merged as a single text)
  • y-axis shows the spell and each dot is coloured coding the book in which it appears
  • Failed to colour the y-axis text based on Type of spell (Charm, Curse or Spell) Have to figure out a way to assign axis text color based on a variable and circumvent the issue below Error
  • Learned ggplot objects in some detail. Thanks to George Karamanis
  • Twitter folks suggested to hide y-axis text and use ggtext to control aes/color of y-axis text
Number of unique words in each book
Number of unique characters in each book

Harry potter colour palette

Color palette which captures HP movies, four houses etc. harrypotter package in R and documentation here

Trivia

  1. There is a course offered by University of Waterloo called popular potter (engl 108p). Check out their syllabus. I have never read a cool syllabus like this before!
  2. ACCIO data is an infographic book from Olivia Rouse. This 80 pages visual guide is a feast for any visual data story enthusiast

References