General Architecture info:

  • a search engine for a large collection of financial news articles from Jan - May 2018.

  • The dataset contains more than 300,000 articles

  • search engine will take a query and find the documents that satisfy the request

  • Main components:

    Index Handler

    • Read and write to the main word index
    • Create inverted file index to store references from each element to be indexed to the corresponding document(s) in which those elements exist
    • Create and maintain an index of ORGANIZATION entities and an index of PERSON entities (store each inside an AVL Tree)
    • Searches the inverted file index based on a request from the query processor

    Document parser/processor (RapidJson parser)

    • stems words & removes stop words from read in news articles

    Query processor

    • Handles user entered phrases by deliniating by boolean & ORG/PERSON
    • EX: AND year fender (finds documents with year and fender)
    • EX: AND year fender PERSON daniel wallis (finds documents that contain year, fender, and daniel wallis)


/Applications/ --build "./cmake-build-debug" --target 22su_search_engine

./cmake-build-debug/22su_search_engine ./sample_data output.txt

worked on program terminal userInterface worked on parsing words from documents




  • created templated avl tree class with reference from


  • created search function for AVL tree templated class
  • worked on stemming words