A program that computes the average of the films per country.
-
First, download the
title.akas.tsv.gz
file and thetitle.ratings.tsv.gz
file too. ( I'll upload the files renamed and pre-downloaded, also, the DB file too. See section called Fonts) -
After decompressing them, place the data.tsv files at the same directory where the python program it is.
curl -O https://datasets.imdbws.com/title.akas.tsv.gz && gzip -d title.akas.tsv.gz && mv title.akas.tsv title.tsv
curl -O https://datasets.imdbws.com/title.ratings.tsv.gz && gzip -d title.ratings.tsv.gz && mv title.ratings.tsv ratings.tsv
Important: Rename the data.tsv files to distinguish which one is the title.tsv and ratings.tsv.
- Then, run the program.
python3 movies.py
Suggestion: Use screen
command to run the program in the background. More info here
sudo apt-get install screen
screen -S <session_name>
Tip: Ctrl A + D
To detach from a screen
Tip2: screen -ls
to see screen list
Tip3: screen -r <name>
to resume a screen
And then run the program.
If you don't know how the program works, don't skip any step. AND BE PATIENT For any help, contact me GitHub | Contact Page.
movies.sqlite
title.tsv
ratings.tsv
I had to discard the films that:
Were Duplicated Didn't have a country (\N)
Films that didn't have a rating (\NA) Films with "fake" country names: (with slashes or dashes.)
Fix code to avoid parsing fake countries (last line) solved with (867afed386064ff74eb80f25f46021a3a454239e)
So, after discarding them, I changed from 10 M (dirty data) rows to 1 M clean (full data) rows.
Finally, the program computes the average per country and displays it out, it also records it into a file called: results.txt
Attention: Every time that you run the program, you'll rewrite the past results at the results.txt