Proyecto 2020-1 de UTEC-CS3102
Google-like search engine for Wikipedia articles.
ZimManager.h
: Read data from zim filehtmlParser.h
: Parse wikipedia article htmlPreprocess.h
: Preprocess text and get a word count
After installing the requirements and in DataProcessing
directory.
make
./processing data/zim/wiki-mini.zim
# word count of each article in wiki-mini.zim
If error while loading shared libraries
, run /sbin/ldconfig -v
The examples
directory contains examples for each main component.
The command to execute an example can be found on the first line of the file.
- zimlib
wget http://ftp.debian.org/debian/pool/main/z/zimlib/zimlib_1.2.orig.tar.gz -O zimlib-1.2.tar.gz
tar xf zimlib-1.2.tar.gz
cd zimlib-1.2
./configure
make
make install
- my-html
git clone https://github.com/lexborisov/myhtml.git
cd myhtml
make
make test
sudo make install
- Search through Wikipedia articles indexed in a B+Tree.
- Indexes are generated with this script in Data Processing.
- Get a small description and keywords of each result.
- Search results are fetched from the Crow search server.
- Powered by a pregenerated trie in JSON with the JSON-Trie format.
- JSON Tries are available in assets. More tries can be generated from a list of words with this script, which uses the Trie class.
- A Flask app scrapes Linio and gets the first n products.
- Links and prices to Linio products are displayed in WikiSearch along with Wikipedia results.
Install dependencies and start React app
npm i
npm run start
Compile and start the server. Crow framework is required.
make
./server
Install Python 3 and Flask and start app
python3 ads_server.py
- Python Flask
- npm
- Crow dependencies From mrozigor/crow forked from the original crow
Ubuntu
sudo apt-get install build-essential libtcmalloc-minimal4 && sudo ln -s /usr/lib/libtcmalloc_minimal.so.4 /usr/lib/libtcmalloc_minimal.so
OSX
brew install boost google-perftools