Adapted from Udemy course: Apache Kafka Series - Kafka Streams for Data Processing
- Kafka
- Kafka Streams
- Elasticsearch
- MongoDB
- Spring Boot
- Spring Retry
- React UI
- Firebase (authentication)
- movie-loader: load movies from csv file
movies_enriched.csv
to kafka topicmovies
- movie-mongo: subscribe to
movies
, save all to mongo db - movie-streams: subscribe to
movies
, count by year, publish it tomovies-eyar
- movie-es: subscribe to
movies
andmovies-year
, save them to es7 withe the same index names - movie-ui: show latest movies and top 250 movies by rating
// build project
cd src
mvn clean install
// start docker desktop
// start application
cd infrastructure/docker
docker-compose -f common.yml up --build -d
docker-compose -f common.yml -f movies.yml up --build -d
// docker-compose -f common.yml -f words.yml up --build -d
// load movies to mongo/es7
http://localhost:8040/mongo/movie/load
// go to UI
http://localhost:3000/home
// to check logs
docker logs wordcount
docker logs wordcountinput
docker logs wordcountoutput -f
// to shutdown and cleanup
docker-compose -f common.yml -f word-count.yml down
docker system prune --volumes
to rescan (enrich) movies
-
drop mongo
movie
collection, and es7movies
index ordocer system prune --volumes
-
reload movie data to mongo and es7: http://localhost:8010/loader/movie/load
check movie duplicates
import pandas as pd
movies = pd.read_csv("data/movies_enriched.csv")
dups = movies[movies.duplicated(['title', 'year'], keep=False)]
dups[['title', 'year', 'imdbid']].to_csv("data/movies_dups.csv", index=False)
movies_unique = movies.drop_duplicates(subset=['title', 'year'], keep='first')
movies_unique.to_csv("data/movies_unique.csv")
try this in browser: http://localhost:8040/mongo/movie/query?title=Terminator
find top 250 rated movies: http://localhost:8040/mongo/movie/all?size=250&sortField=rating&direction=DESC&page=0
Or query with Mongo Compass connection: mongodb://root:example@localhost:27017/?authSource=admin&readPreference=primary&appname=MongoDB%20Compass%20Community&ssl=false
use postman
localhost:9200/movies/_search
localhost:9200/movies-year/_search
- Endpoint to drop es7 index and mongo collection
- React app to display movies
- maven build ui, copy build to docker
- query stats
- query stats from es7
- infinite scroll
- filters: genre, years, rating, director?
- fix movie stats filter
- deploy to aws
- avoid scanning all files