nypost
There are 3 repositories under nypost topic.
mukeshkdangi/nypost_searchengine
Crawled and stored metadata of web pages using multithreaded crawler. Used GCP Hadoop cluster to create inverted index. Developed custom page rank algorithm and exposed RESTful APIs with spellchecker and autocomplete features.
mukeshkdangi/crawler_nypost
Crawling web pages and indexing for solr search
mukeshkdangi/edgeLink_nypost
Generating Edged between web pages which referenced from one another