/solr-tmdb

TheMovieDB in Solr

Primary LanguagePython

Solr Index for the The Movie Database.

This repository is part of the Think Like a Relevancy Engineer training provided by OpenSource Connections.

Steps to get up and running:

  • Download this repo
  • Install Solr search engine and configuration (using either Docker or installing manually)
  • Index the TMDB movie data
  • Confirm Solr has the data
  • Install Postman (optional)

Download this repo

Download the zip from https://github.com/o19s/solr-tmdb/archive/master.zip, and you will get the file solr-tmdb-master.zip. Unzip this file, resulting in the directory solr-tmdb-master.

After you have this download, change into the newly created directory.

Install Solr

Two options exist to run Solr locally, however if neither of them will work for you, we do have a public version of this dataset deployed at http://quepid-solr.dev.o19s.com:8985/solr/ that you can use during the class as well, so don't fret if your environment won't let you set up Solr!

Docker option (recommended)

If you have Docker installed and running.

Linux/OSX:

./docker.sh

Windows:

powershell docker.ps1

Local option

  1. Download and unpack Solr 8.11.1

  2. Navigate into the newly unzipped directory.

  3. Open /path/to/solr-tmdb-master/solr_home/tmdb/conf/solrconfig.xml and change the path to include the extra libraries located in /path/to/solr-tmdb-master/docker/lib.

  4. Run Solr pointing at the TMDB Solr Home directory included in this repo.

Linux/OSX:

bin/solr start -f -s /path/to/solr-tmdb-master/solr_home/

Windows:

bin\solr start -f -s \path\to\solr-tmdb-master\solr_home\

Regardless of the option you choose, navigate to http://localhost:8983/solr/ to confirm Solr is running.

Index TMDB movies

We have a movie data corpus sourced from The Movie Database, similar data to IMDB (Internet Movie Database).

Linux/OSX:

./index.sh

Windows:

powershell index.ps1

If you get a permissions error, just open the index.ps1 file and copy and paste the contents into your Powershell console

You are indexing a 12 mb JSON file, so this will take a minute!

Confirm Solr has TMDB movies

Navigate here and confirm you get results.

If you don't see any results, trigger a manual commit.

Postman

Postman is an API development tool, that helps build, run and manage API requests. The examples from the TLRE slides exist here too as a Postman Collection (solr-postman_collection.json). We like using Postman because it makes tinkering with query parameters nicer and we think it is a useful way to follow along as you learn about tuning search relevance.

If you want to use Postman during the TLRE class:

  1. Download Postman for your OS
  2. Open Postman and Import (top-menu >> File) solr-postman-collection.json
  3. Define a global variable (grey eye icon in the upper-right) solr_host to point to your running Solr instance (default is localhost:8983)
  4. Tinker with the base URL, Params or JSON Body (optional)
  5. Press 'Send' (blue rectangle button right of URL bar)