Film's Perfomance

A program that computes the average of the films per country.

Pre-requisites

Tutorial

First, download the title.akas.tsv.gz file and the title.ratings.tsv.gz file too. ( I'll upload the files renamed and pre-downloaded, also, the DB file too. See section called Fonts)
After decompressing them, place the data.tsv files at the same directory where the python program it is.

curl -O https://datasets.imdbws.com/title.akas.tsv.gz && gzip -d title.akas.tsv.gz && mv title.akas.tsv title.tsv

curl -O https://datasets.imdbws.com/title.ratings.tsv.gz && gzip -d title.ratings.tsv.gz && mv title.ratings.tsv ratings.tsv

Important: Rename the data.tsv files to distinguish which one is the title.tsv and ratings.tsv.

Then, run the program.

python3 movies.py

Suggestion: Use screen command to run the program in the background. More info here

sudo apt-get install screen

screen -S <session_name>

Tip: Ctrl A + D To detach from a screen Tip2: screen -ls to see screen list Tip3: screen -r <name> to resume a screen

And then run the program.

If you don't know how the program works, don't skip any step. AND BE PATIENT For any help, contact me GitHub | Contact Page.

Pre Work and Database Pre-created/downloaded files

movies.sqlite title.tsv
ratings.tsv

My Story 'n Problems while coding

I had to discard the films that:

Were Duplicated Didn't have a country (\N)
Films that didn't have a rating (\NA) Films with "fake" country names: (with slashes or dashes.)
Fix code to avoid parsing fake countries (last line) solved with (867afed386064ff74eb80f25f46021a3a454239e)

So, after discarding them, I changed from 10 M (dirty data) rows to 1 M clean (full data) rows.

Finally, the program computes the average per country and displays it out, it also records it into a file called: results.txt

Attention: Every time that you run the program, you'll rewrite the past results at the results.txt

Fonts

IMBd Documentation
title.akas.tsv.gz
title.ratings.tsv.gz