/pyfriends

Let's research over all the seasons of Friends sitcom and try to get some insights from it 🕵

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

PyFriends

This is project is an attempt to improve what is out there about Friends in terms of data. Here you'll find a way to explore all the data available about Friends, either using Pandas or plan SQL.

Getting started it

You can execute docker-compose up app and then access the JupyterLab through the link shown in your command-line interface. With this approach, you can use Pandas as you will.

An animation that shows someone starting the project with JupyterLab

If you only want to execute SQL, just run docker-compose up builder and wait until it's finished. Then you can open your favorite SQL browser and connect to the PostgreSQL database with the following data:

  • URL: jdbc:postgresql://localhost:5432/postgres
  • User: postgres

About the entities:

It has 5 tables which describe how the database was modelled

Delta architecture

I'm following the Delta Architecture design pattern but I changed it a bit to fit this small project. So here you'll find the following layers:

  • Raw layer: As the name suggests, you'll find the raw data without any processing. Although, in real projects, it may contain profiling of all attributes, scoring the data in terms of its adherence to domain business and its typing, governance (like data catalog and many more), and security.
  • Integration layer: The data is organized, and a clear pattern can be noticed. In other words, you can query the data through well-organized tables. They may have relationships that reflect how it is in the real world, but there are no KPIs created from it. It means the data is queryable and ready for insights but without business rules. As always, governance and security play a role here too.
  • Business layer: The KPIs can be found here, and it's a layer where a user without expertise in query can understand data easily. Again, governance and security are involved to guarantee many aspects of each domain.

Tutorials I used to understand Pandas

Querying and plot:

Credits