A curated dataset of Academy Award nominations with IMDb unique identifiers.
The unique identifiers are key to disambiguate people/films with similar names.
- The Kenneth Branagh who was nominated for directing Henry V in 1989 is the same person who was nominated for acting in a supporting role in My Week with Marilyn in 2011 and the same as the person who won for writing an original screenplay for Belfast in 2021 (as well as 5 other nominations).
- The Steve McQueen nominated for acting in The Sand Pebbles in 1966 is not the same as the Steve McQueen who won Best Picture for 12 Years a Slave in 2013.
- The most complex situation is "Robert Benton" could mean two different people, each of whom was nominated in multiple different categories.
Ceremony
(int) - Ordinal for which ceremony the nomination was for (starting at 1)Year
(string) - Year(s) from which the films are honored.Class
(string) - A custom broad grouping for categories. Values include:- Title (e.g. Best Picture)
- Acting
- Directing
- Writing
- Music
- Production
- SciTech
- Special
CanonicalCategory
(string) - Removes the variations on the exact wording of the category name over the yearsCategory
(string) - The precise category name according to Oscars.orgNomId
(uuid) - Unique string representing the IMDb Nomination IDFilm
(string) - The title of the film (optional)FilmId
(uuid) - Unique string representing the IMDb Title ID.Name
(string) - The precise text used for who is being nominated.Nominees
(comma separated strings) - The names of who is nominated in a comma separated list (without any extra text like "Written by")NomineeIds
(comma separated uuids) - Unique strings (or question marks) representing the IMDb Name ID.Winner
(bool) - True if the award was wonDetail
(string) - Detail about the nomination, which could be the character name, song title, etc.Note
(string) - Additional information provided about the award/nomination.Citation
(string) - Official text of the award statement, for Scientific/Technical/Honorary awards.MultifilmNomination
(bool) - Generally the data is one nomination per row, but for certain early nominations (Ceremonies 1, 2, 3 & 8), people were nominated for multiple films, and so one nomination could be spread over multiple rows.
- Manually Download HTML
- Visit The Awards Database @ Oscars.org
- Set the Award Years to the maximum possible range and Search. (Display Results by should already be set to
Category (chron)
) - Save the results to
oscars_html/search_results.html
. - If the nominations have been announced but NOT awarded, download the nominations by saving The Ceremonies Page as
oscars_html/nominations.html
- Prepare the Raw Oscars Data
- Parse the HTML you just downloaded
- If you downloaded nomination data, run
./parse_oscars_html.py -n
- Otherwise, run
./parse_oscars_html.py
- Run
./add_fields_to_csv.py
- Run
./parse_citations.py
- Manually update any of the citations in
citations.yaml
, and runparse_citations.py
again as needed.
- Obtain Lots of IMDB Data
- Run
./scrape_imdb_html.py
- Run
- Merge in IMDB Data
- Run
./merge.py -w
- Run