IMDb_xref
Quickly search IMDb for principal cast members of TV shows or movies, characters they portray, other shows they are in, and whether multiple shows have cast members in common.
Create comprehensive lists and spreadsheets about your favorite shows. They're useful as an overview or for researching details on shows and cast members. For example, shows with episode titles, credits sorted by show/episode, and credits sorted by cast member.
[](https://github.com/Monty/ IMDb_xref)
Table of Contents
- Motivation
- Download IMDb_xref
- Automated quickstart
- Manual installation
- Limitations
- Compatibility
- Suggestions
- Performance
- Contributing
- License
Motivation
When watching a TV show or movie, have you ever spotted a familiar face but can't remember the actor's name or what other shows you've seen them in?
To solve this I used to go to the IMDb website; find the show; click on "See full cast & crew"; find the character; click on the actor's name; then scroll through their "Filmography" to see if I recognized any other shows I'd watched. This was both time-consuming and difficult -- even more so if I wanted to know if two shows had actors in common.
I wrote IMDb_xref to answer such questions simply and quickly. Now I have even more fun learning about actors and shows.
A simpler solution
Suppose you're a fan of the PBS series "The Crown". You start watching "The Night Manager". You recognize the actress who played Princess Diana in "The Crown" but aren't sure of her name.
Run start.command
, select Find the principal cast & crew members of one or more shows
. Enter The Crown
, enter The Night Manager
, enter a blank
line. It will find 4 shows titled The Crown - select #4, the tvSeries. It will
display the cast of "The Crown", the cast of "The Night Manager", and finally,
the principal cast members who appear in more than one show. You can easily see
that the actress you were looking for is Elizabeth Debicki.
If you like those shows, save them to your favorites. It will enable some advanced features we'll cover later.
Then select Find all shows listing a person as a principal cast or crew member
. Enter Elizabeth Debicki
. It will find 17 titles listing Elizabeth
Debicki as: actress.
Repeat with any cast members you want to know more about, such as Olivia Colman. You'll discover she is in 62 shows, including "Broadchurch".
Look up the cast of "Broadchurch" to find more actors, then find more shows and even more actors. Enjoy exploring! Each query result includes handy links to imdb.com in case you only want to use IMDb_xref as a less cumbersome IMDb search tool.
Download IMDb_xref
Either download an IMDb_xref release or type those commands into a terminal window:
git clone https://github.com/Monty/IMDb_xref.git
cd IMDb_xref
If you get a pop-up saying: 'The "git" command requires the command line developer tools. Would you like to install the tools now?', click the "Install" button, not the "Get Xcode" button.
Automated quickstart
In a terminal window, type ./start.command
. In macOS, you can simply
double-click the start.command
icon. (The first time, control-click or
right-click instead. Then select Open
from the pop-up menu and click Open
in
the dialog box.)
This will set up your preferences, install prerequisites, download the compressed IMDb data files, and open the top-level menu shown below.
Select #1 Find the principal cast & crew members of one or more shows
. Enter
the title of a movie or TV show you like. If you know another show starring some
of the same actors, enter that on the next line. Then enter a blank line.
Understanding query results
The "Searching for" section lists the search terms used, one per line. If you get unexpected results in a complex query, check it to see if you mistyped a search term.
The "Principal cast & crew" section contains all rows with a match for any term. It can be quite long for complex queries.
The "... listed in more than one" section contains only rows with names found in more than one show. It can be empty.
Selecting #2 Find any principal cast & crew members listed in more than one show
will hide the "Principal cast & crew" section. Running identical queries
using #1 and #2 will give you an understanding of when each is useful.
Menu selections #3 and #4 search for principal cast and crew members instead of show titles. Results should be self-explanatory.
Cross-reference saved shows
When prompted in #1 or #2, add some shows to your favorites, and update your data files. That will create lists and spreadsheets that combine data for cross-referencing. Those files are much smaller, enabling faster queries.
Select #5 Run a cross-reference of your saved shows
to enter search terms a
line at a time.
You can mix and match shows, cast or crew members, and characters portrayed in a single search, e.g. The Crown, Olivia Colman, and Queen Elizabeth. Search for two or more actors to see if they appear in any shows together. Search for two or more shows to see which actors, if any, appear in more than one.
Select #6 Run a guided cross-reference of your saved shows
to predict and fill
in search terms with minimal typing. This is particularly useful on a tablet
running a terminal emulator. I use the free version of Termius on an iPad, but
others should work also.
You can use #7, Show me a list of my saved shows
to make sure you have saved
the necessary shows before cross-referencing.
Search term hints
You don't need to quote a search term or escape spaces and other special
characters. The Crown
or Schitt's Creek
will both be handled
correctly.
Shows with non-English titles such as Jo Nesbø's Headhunters
or cast
member names like Rolf Lassgård
must be entered exactly. You can
copy/paste such search terms, or use a tconst/nconst found in their IMDb URL,
e.g. https://imdb.com/title/tt1614989/ and https://www.imdb.com/name/nm0489858/
Searches use "smart case". If there are no uppercase letters in any search term, searches will match both uppercase and lowercase letters. However, you may get more results than if your search terms were exact.
Manual installation
If you are comfortable typing commands into a terminal window, you may prefer using the following steps to set things up yourself.
Install prerequisites
Install ripgrep to get acceptable performance. Searching 700 MB of compressed data with zgrep is 15x slower. See https://crates.io/crates/ripgrep. (If anyone wants to rewrite this to use zgrep or another search engine, be my guest.)
While it's not required, xsv improves table layout, especially for non-English names, by using "elastic tabs". See https://crates.io/crates/xsv.
Generate sample data
Run ./generateXrefData.sh
to download the IMDb data files and generate lists
and spreadsheets containing principal cast members, characters portrayed,
alternate titles, and other details from IMDb. This takes 40 seconds on my 2014
iMac. (Note: Longer if you have a slow internet connection.)
Re-running ./generateXrefData.sh
doesn't download the IMDb data files again.
This reduces the run time to 20 seconds. It will overwrite any previously
generated files.
Run sample queries
Run ./xrefCast.sh -h
(help) to see some example queries that can be typed
into a terminal window.
Run ./demo.command
to see the types of information returned from those queries
and more.
Generate additional data
Since ./generateXrefData.sh
displays statistics as it runs, you probably
noticed that it only produced data on 3 shows with 92 episodes -- crediting 87
people with 758 lines of credits. It did so by selecting three PBS shows from
example.tconst
and creating the example files PBS.tconst
and
PBS.xlate
.
If you run ./generateXrefData.sh -t
, it will load all the shows in
tconst.example
. You'll now have data on 98 shows with 2205 episodes --
crediting 3769 people with 19002 lines of credits. Running this takes about 45
seconds. However, queries should still take less than one second.
You can clean up any data you don't want by running cleanupEverything.sh
. I
suggest you don't delete anything until you've run through the entire list of
choices it offers.
Explore other commands
All the commands in the top-level menu invoke shell scripts that can be run in a terminal window, supplying options and parameters on the command line.
To learn more run ./explain_scripts.sh
or examine the included shell scripts.
If you run commands as shell scripts, you'll need to be careful to quote and escape spaces and other special characters.
If you run one of the commands in the top-level menu as a shell script, it will
still open the top-level menu when it exits. I find this convenient, but if you
would prefer that it exit, simply set a NO_MENUS environment variable, i.e.
export NO_MENUS="yes"
.
Limitations
Data downloaded from IMDb often has errors or omissions. It has less information on cast and crew than is available on the IMDb website.
Data on shows only includes "Principal cast & crew members", which is limited to 10 persons per show. Queries for movies only return those 10. Queries for TV shows can return more than 10 because each episode has its own credits -- which is why you can see 56 "Principal cast & crew members" for "The Crown".
IMDb prohibits scraping their website, but you can use the imdb.com links we provide to access the "Full Cast & Crew" data online.
Downloading IMDb data frequently is not as beneficial as you might think. While the data is updated daily, those updates are usually minor changes, like changing the type of a show from tvSeries to tvEpisode, or changing the titles a person is most known for.
Queries for principal cast & crew members can include results you might not
expect, e.g. cinematographers and editors. However, updating your data files
only saves actors, actresses, writers, directors, and producers. To save all
types run generateXrefData.sh -a
at any time. You may want to also use the
-d
or -f
options to prevent the larger results from being overwritten.
Queries for all shows listing a person as a principal cast or crew member can include results you might not expect, e.g. videoGame or radioSeries. For each type, you will be asked if you want to display those results.
Compatibility
Tested on macOS and Linux. It may work in Windows 10 if Windows Subsystem for Linux is installed.
Suggestions
Start your own lists: broad genres such as Comedies, Sci-Fi, Musicals, Historical Dramas -- or more specific ones like "All Alfred Hitchcock movies", "TV shows with Robots", or "shows with Salsa music", "Shows for Trivia questions".
Until I find time to produce more documentation, you can learn a lot from the descriptive comments in the shell scripts, .example, and Contrib files.
Performance
Even complex queries on 14MB of saved shows run in less than 100ms on my 2014 iMac, 25ms on my 2019 MacBook Pro with an internal SSD. There is almost no difference between using gzipped data and non-gzipped data.
Show comparative benchmarks
Timing results for running 5 queries on gzipped and non-gzipped files. Both contain 219510 rows. The gzipped file is 3.0MB, the non-gzipped file is 14MB. The times are nearly identical, with a very slight edge to the gzipped version.
On a 2014 iMac with internal hard drive:
$ hyperfine -w 5 './xrefTest.sh -f ZipTest.csv' './xrefTest.sh -f ZipTest.csv.gz'
Benchmark #1: ./xrefTest.sh -f ZipTest.csv
Time (mean ± σ): 95.2 ms ± 0.9 ms [User: 28.3 ms, System: 46.2 ms]
Range (min … max): 92.9 ms … 97.2 ms 30 runs
Benchmark #2: ./xrefTest.sh -f ZipTest.csv.gz
Time (mean ± σ): 94.9 ms ± 1.0 ms [User: 28.4 ms, System: 45.7 ms]
Range (min … max): 92.9 ms … 97.9 ms 30 runs
Summary
'./xrefTest.sh -f ZipTest.csv.gz' ran
1.00 ± 0.01 times faster than './xrefTest.sh -f ZipTest.csv'
On a 2019 MacBook Pro with an internal SSD.
$ hyperfine -w 5 './xrefTest.sh -f ZipTest.csv' './xrefTest.sh -f ZipTest.csv.gz'
Benchmark #1: ./xrefTest.sh -f ZipTest.csv
Time (mean ± σ): 17.0 ms ± 1.0 ms [User: 6.0 ms, System: 8.8 ms]
Range (min … max): 16.1 ms … 23.0 ms 155 runs
Benchmark #2: ./xrefTest.sh -f ZipTest.csv.gz
Time (mean ± σ): 16.8 ms ± 0.7 ms [User: 5.9 ms, System: 8.7 ms]
Range (min … max): 16.1 ms … 20.7 ms 155 runs
Summary
'./xrefTest.sh -f ZipTest.csv.gz' ran
1.01 ± 0.07 times faster than './xrefTest.sh -f ZipTest.csv'
Contributing
Feel free to dive in! Contribute an interesting tconst list, submit additional scripts, Open an issue, or submit PRs.
License
MIT © Monty Williams