Multithreaded indexing API in Java 19+ that allows a word index to be created from a .txt file.
- Set input/output/dictionary & forbidden words file paths to get an index and word analysis of the provided .txt file.
- Option 5 - execute to get an index .txt file outputted with a list of words, their page indices, number of occurrences, and a dictionary definition.
- Dictionary definitions are provided if a dictionary file is loaded. Please ensure the dictionary format is as per note below for correct parsing.
- It is recommended to load a forbidden/common words file to eliminate words which would spoil an index.
- Option 6 - Top 20 Words - get the top 20 words in a text based on number of occurrences. This will be outputted to a .txt file. A common words file must be uploaded to use this feature.
- Option 7 - Word Searcher - search for a word in the text and get, frequency, and page index printed to the console.
You may need to run the provided indexer.jar file with the --enable-preview
flag, as this application uses the Virtual Thread preview features in Java 19.
java --enable-preview -cp ./indexer.jar ie.atu.sw.Runner
The application is designed to work for dictionary definition in .csv format as provided on Moodle.