This repository contains all the files related to project's data collection, data normalization / cleansing and database management.
This project includes the following college subjects: Web development, TI design and management, Artificial Intelligence.
- Docker, docker-compose.
- Python: sentence-transformers, Bottle Web Framework.
- Scala: Play Web Framework, org.apache.spark.sql, org.apache.spark.launcher.SparkLauncher.
- Apache Spark.
- Apache Hadoop.
- Docker images and docker-compose files (See this folder for more details):
- Vectorize api: Get user query's embeddings read this for theorical context and view the used code here.
- Search algorithm (See this folder for more details):
The following is an example of the results obtained after executing the cosine-similarity algorithm on Apache Spark cluster: