Aren't you frustrated having a boatload of quality bookmarks, but not using them because it is faster to just fire a browser and do a Google search, instead ! Yeah, me too !
You no longer need to do that. Enter the Personal search engine (PSE), which you can use to index your bookmarks and search like you do with Google.
But wait there is more, when you issue your search query the PSE in the background does a Google search for you (or other search if you implement it :)) and displays both results.
The code is working but is still in Alpha stage. When it is Beta, I will write an article on http://www.igrok.site how it works. Below is a quick recepie how to install it and use it.
> git clone https://github.com/vsraptor/pse.git
> cd pse
You probably have those already installed, but I list them here for completness. Skip this section in general.
Dependencies :
> apt-get install build-dep build-essential
> apt-get install python-dev python-numpy python-scipy libatlas-dev libatlas3-base
You would need to install scikit-learn (for Tfidf support) and Flask for the web app
> pip install lxml
> pip install numpy
> pip install requests
> pip install stop_words
> pip install scikit-learn
> pip install flask
> pip install flask-script
> pip install flask-bootstrap
Next either create manually url.lst file in the data directory or generate one using bin/bm2urlst.py. Btw url.lst is simply list of URLs. (This repo contains one just for tests, but better generate your own once you have the app running. You can also have empty lines or comment urls with hash so they don't get included in the index)
Now you have to run the indexer to create the tfidf index matrices. This will go trough the list of URLs, fetch the pages and create index, which later you will use to do the searches.
> cd bin
> python idx.py
(4/14/19 note): I forgot to perform this step, and was scratching my head over a vocabulary.csv does not exist
(paraphrased) error.
There cmd line app, is mainly for testing purposes. You can run it like this (-b bookmark search, -g google search) :
> cd bin
> python query.py -b -q 'history biology'
Or better run the Web app :
> cd site
> python manage.py runserver
Then go to the following web address :
http://localhost:5000
> cd bin
> python bm2urlst.py /path/to/bookmarks.html | grep -v 'png$\|gif$\|jpg$' > ../data/url.lst