Finds the list of actors with the most boxoffice profit using TMDB API.
First, I crawl the Box Office Mojo website to gather a list of the best selling movies.
Then, for each movie, I use TMDB's API to get the movie_id
by movie's name
and release year
, and finally I get the list of actors of that movie.
The profit for each actor gets accumulated and displayed.
df = print_actors_map_pd(actors_map, actors_movies_map, limit=50, movies_limit=1)
Limited the movie count to 1
df = print_actors_map_pd(actors_map, actors_movies_map, limit=50, movies_limit=5)
Limited the movie count to 5
- Crawl boxofficemojo and get a list of best selling movies
- Use Multiprocessing to make the crawling faster
- [] Export the final data as json
- [] Display as webapp
First, install these modules:
- requests
- beautifulsoup4
- python-dotenv
- pandas
If you want, you can use the provided data and go straight to the notebook called most_profitable_actors_multiprocessing.ipynb
.
But if you want to update the data, run these files in order:
boxoffice_scraper.py
that uses requests and bs4 to extract the best selling movies.gather_data_multiprocessing.py
that uses TMDB's API to get the list of actors for each movie extracted by the last step.most_profitable_actors_multiprocessing.ipynb
open the notebook and read the comments. Make sure to setis_data_available
toFalse
if you want to recalculate everything.
By Gholamreza Dar - Fall 2022