Python web crawler with a C++ search engine - Vighnesh Souda.
Be sure to install BeautifulSoup4 for Python 2.7.
- First, you need to specify the number of input pages. This is denoted by the variable “NUMPAGES” in webCrawler.py. This variable MUST be the same value as “NUM_INPUT_PAGES” located at the top of the “test.cpp” file.
- Next, you need to choose a topic to initially query the webCrawler. This must be a valid page in the Wikipedia database, like “Math” or “United_States” or “Cuisine”.
- Next, in the project workspace, run “python webCrawler.py” in a terminal. There should not be any issues.
- Next, in the project workspace, in a terminal, run “g++ test.cpp -o out -std=c++11”. Note: this will most likely not work on Windows. It is very important that you include the flag “std=c++11”. After doing so, you should see a file called “out” in the project directory.
- In the same terminal, run ./out. The program should run with no problem, and you should see a request to query the trie.
$ git clone https://github.com/vsouda/Search-Engine-With-Web-Crawler.git
$ cd Search-Engine-With-Web-Crawler/src/
$ python webCrawler.py
$ g++ test.cpp -o out -std=c++11
$ ./out