- Developed vector space model based web retrieval engine for University of Memphis domain (memphis.edu).
- Crawled and preprocessed 10, 000 web pages and docs (text, pdf, docx and pptx) from University of Memphis domain.
- Built modules - web crawler (incremental), text preprocessor (removes- (markup, metadata, uppercase, digits, punctuation, space, stop words), tokenize, stem from raw HTML/docs), Indexer (doc-url, doc-term, term-doc), TF-IDF vector generator, webpage relevance ranker and performance evaluator (F1, precision, recall).
- Used TF-IDF vector space model for web page matching and cosine similarity function for web page ranking.
- Go to search_engine/search_engine_website
- Run inverse_document_indexer_final function in "search_engine.py" file to collect documents(html/php/txt/doc/docx/ppt/pptx) using web crawler.
- This builds vector space model with inverse document indexer and TF-IDF vector for all collected documents.
- Option available to change to website by changing the url value in "search_engine.py".
- Enter query term for retrieving or searching within collected web documents.
- To run Django server go to ”search_engine/search_engine_website”
- Open command prompt in the current directory of manage.py and type manage.py preceded by python.exe location and python in the follwoing manner:
- C:\Users\Anjana\Anaconda3\pythonmanage.pyrunserverserver (format->locationforpython.exe+python+manage.py)
- To view web interface for search engine go to http://127.0.0.1:8000/
- Install pip for python.
- Move to python directory or scripts directory in Anaconda.
- Please enter ”pip install Django” for installing Django.
Current Version : v1.0.0.0
Last Update : 12.01.2017