Personal Project
Introduction · Algorithm · Setting Up Locally · Tech Stack . Routes
Uncovering the vast ocean of information on the web requires sophisticated tools that can navigate the complexities of content, context, and relevance. My project, which I've playfully nicknamed "Finder", is a powerful crawler that digs through websites and meticulously indexes every word on every page. It doesn’t stop there; it also calculates the frequency of each term and tracks the network of links between pages.
At the heart of "Finder" are the sophisticated TF-IDF (Term Frequency-Inverse Document Frequency) algorithm and PageRank (simplify). This dual approach allows “Finder” not only to index, but also to understand the importance of content, thus making search results more relevant and useful.
TF = Total number of words in the page / Term frequency in the page
IDF = log(Number of pages containing the term / Total number of pages)
TF-IDF = TF×IDF
Total TF-IDF Score = ∑terms(TF-IDF of the term)
PageRank = Number of incoming links
Final Score=α(Total TF-IDF Score)+β(PageRank)
To configure Finder locally, you will need to clone the repository and configure the following environment variables (in the .env file):
DB_HOST="database"
DB_PORT="5432"
DB_USER="root"
DB_PASSWORD="root"
DB_NAME="postgres"
To run the app locally, you can run the following commands:
docker-compose up -d
cd server
make db-up
Finder is built on the following stack:
Back End:
Front End:
- TypeScript - Programming Language
- React - JavaScript Library
- Vite - Build Tool
- Tanstack Router – Routing
- Tanstack Query - Query Management
- Ky - fetching Library
- TailwindCSS – CSS Framework
Database:
- PostgresSQL - Relational Database
Infrastructure & Deployment:
- Docker - Containerize
Crawler Routes:
http://localhost:3000/swagger
Displays the crawler swagger
Search Routes:
http://localhost:5173/
Displays the search page