This project is created for educational purposes as part of my Information Retrieval course. The primary goal is to implement an indexing engine in Java and utilize it to process Turkish poems by NAZIM HIKMET.
The objective of this project is to learn and implement the fundamental concepts of information retrieval, including text preprocessing, indexing, and potentially search functionalities.
- Web scraping to retrieve a Turkish poem from a specific URL.
- Text preprocessing techniques such as tokenization, and lowercasing.
- Calculation of cosine similarity of different poems
- (Planned) Search functionalities to retrieve information based on user queries.
- Java
- Jsoup (for web scraping)
- (Planned) Apache Lucene (for advanced indexing and search functionalities)
This project serves as an educational resource to understand and apply information retrieval concepts. To use or contribute to this project, clone the repository and follow the setup instructions.
The initial inspiration for this project came from https://www.cs.rpi.edu/~sibel/poetry/nazim_hikmet.html where the Turkish poem is sourced.
This project is currently in its early stages, focusing on text retrieval and preprocessing. Contributions and suggestions for improvement are welcome.
Zainab Lawal