/ELK-final_test

Index movie lens content to Elasticsearch and implement cli

Primary LanguagePython

Task description

- Index movie lens content to Elasticsearch

movie content (movies.csv, ratings.csv, tags.csv)

You could use any way to index data:

  • Logstash (csv input, ... )
  • Java/Python/C# – Elastic client -> indexing

- Write console application which search movies

  • match phrase
  • fuzzy
  • filter/sort by average rating
  • finding top-10 tags for the movie
  • find movies which userX is put rating of 5).

NB: Try implement it using several approaches for working with hierarchical data and explain which one is the best fit here

The implementation

Tips for speed up ingestion

Build image for ingestion

sudo docker build -t movies_ingestion -f ingestion/Dockerfile .

Map data volume from here movie content (movies.csv, ratings.csv, tags.csv) and run ingestion

NB: ETA 30-40 min

docker run --network=host -it  -v $("pwd")/data:/app/data movies_ingestion

Build docker image

sudo docker build -t movies_searcher -f docker/Dockerfile .

Run image

docker run --network=host -it movies_searcher

Usage within container

For simplicity movie alias is used to run the app in container

image

- match phrase

image

Example

movie match-title "Toy story"

Output

image

- fuzzy

image

Example

movie fuzzy-title "Golang"

Output

image

- filter/sort by average rating

image

Example

movie top-movies --genre Adventure

Output

image

- finding top-10 tags for the movie

image

Example

movie movie-tags 100

Output

image

- find movies which userX is put rating of 5).

image

Example

movie user-top 1001

Output

image