/movie-graph

Knowledge Graph Demo

Primary LanguageTypeScriptApache License 2.0Apache-2.0

title description tags
Movie Graph
Demo Graph DB
Concerto
Neo4J
Concerto Graph

Movie Graph

Create your own personal Movie Knowledge Graph using data from IMDB!

This project uses Concerto Graph to load data about Movies, actors, and plot summaries into a Neo4J graph database and then presents a command line interface to query the data using natural language.

demo Code

Enter command (add,search,query,delete,quit) or a natural language query: what are the names of the 3 highest rated films with Marilyn Monroe that include plot summaries that contain the word "marry"?

Calling tool: get_embeddings
Converting query with embeddings to Cypher...

Generated Cypher: MATCH (p:Person {name: 'Marilyn Monroe'})-[:RELATED_TO]->(m:Movie)
CALL db.index.fulltext.queryNodes('movie_fulltext', 'marry')
YIELD node AS movie, score
WHERE movie.identifier = m.identifier
RETURN movie.title AS title, movie.averageRating AS rating
ORDER BY rating DESC
LIMIT 3

[
  {
    "title": "Some Like It Hot",
    "rating": 8.2
  },
  {
    "title": "Gentlemen Prefer Blondes",
    "rating": 7.1
  },
  {
    "title": "Clash by Night",
    "rating": 7
  }
]

Install

Download IMDB Data

Download the following data sets (free for non-commercial use) from IMDB:

  • title.basics
  • name.basics
  • title.principals
  • title.ratings

Each file is a zipped tsv (tab-separated-values) file. Save the files in the ./imdb folder.

Download Wiki Movie Plots CSV

The plot summaries for a selection of movies (not all) are not part of the public IMDB data sets so must be downloaded separately from Kaggle.

A (free) Kaggle account is required

https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots

Save the downloaded csv file in the ./imdb folder.

Load Data Into SQLite

Note that on Mac OS X SQLite is installed by default. On other platforms you may have to install it manually.

Launch SQLite:

sqlite3 im.db

Then in the SQLite shell, run the following commands. Run each command separately; some of the commands may take several minutes to complete:

.mode ascii
.separator "\t" "\n"
.import ./imdb/title.basics.tsv titles
.import ./imdb/name.basics.tsv names
.import ./imdb/title.principals.tsv principals
.import ./imdb/title.ratings.tsv ratings
.mode csv
.import ./imdb/wiki_movie_plots_deduped.csv plots

create index titles_id on titles(tconst);
create index names_id on names(nconst);
create index principals_id on principals(tconst);
create index ratings_id on ratings(tconst);
create index names_primaryName on names(primaryName);
create index principals_name on principals(nconst);

You should now have an ±8GB SQLite database containing most of the IMDB data, indexed for retrieval, and ready to be inserted into your Knowledge Graph.

Set Environment Variables

Export the following environment variables to your shell.

Unix:

export NEO4J_URL=YOUR_URL
export NEO4J_PASS=YOUR_PASS
export OPENAI_API_KEY=YOUR_API_KEY

GraphDB

  • NEO4J_URL: the NEO4J URL. E.g. neo4j+s://<DB_NAME>.databases.neo4j.io if you are using AuraDB.
  • NEO4J_PASS: your neo4j password.
  • NEO4J_USER: defaults to neo4j

Text Embeddings & Chat With Data

  • OPENAI_API_KEY: the OpenAI API key. If not set embeddings are not computed and written to the agreement graph and similarity search is not possible.

Running

npm start

Then use the following commands:

  • add: adds all the movies related to a specific person to the graph database
  • delete: deletes all nodes from the graph database
  • search: full text search over movie nodes
  • query: similarity (conceptual) search over movie nodes
  • other: converts natural language queries to graph queries and runs them
  • quit: to exit