The MathWebSearch system (MWS) is a content-based search engine for mathematical formulae. It indexes MathML formulae, using a technique derived from automated theorem proving: Substitution Tree Indexing. The software is licensed under the GNU General Public License version 3.
can be found at http://search.mathweb.org, in particular, the arXiv demo is at http://arxivsearch.mathweb.org
analytics/
user defined analytics source filesconfig/
configuration filesdata/
data used to run a MWS demodoc/
documentation for MWS usersscripts/
utility scriptssrc/
source codetest/
test source code and datathird_party/
third party source codeCMakeLists.txt
CMake build scriptLICENSE
copy of the license under which this software is distributedMakefile
build makefileREADME.md
documentation overview about the projectTODO
project TODOs which have not materialized into tickets
Compiling of the source tree is automated via CMake. You can build the sources using the following command:
make
Binaries are built in the bin/
directory, while documentation in bin/docs
.
To select or de-select which components to compile, use:
make config
To run the tests, use:
make test
Finally, install the binaries using:
make install
To build this software, one needs:
- g++ (with pthread) (>= 4.4)
- cmake (>= 2.6)
- make
- pkg-config
The core MathWebSearch executables require:
- libmicrohttpd (>= 0.4)
- libxml2
- libleveldb
- libsnappy
- libjson-c
- libjson0-dev
The crawler executables require:
- libhtmlcxx-dev
- libicu-dev
- libcurl4-gnutls-dev
The documentation target requires:
- doxygen
The test target requires:
- netcat
- curl
The config step requires:
- cmake-curses-gui
To install all build, runtime and test dependencies on a sufficiently new Debian / Ubuntu machine you can use:
apt-get install g++ cmake make pkg-config libmicrohttpd12 libxml2 libleveldb1v5 \
libsnappy1v5 libjson-c3 libhtmlcxx3v5 libgnutlsxx28 libicu57 libcurl3-gnutls
To install all build, runtime and test dependencies on Mac OS, you can use homebrew:
brew install gcc make cmake pkg-config libmicrohttpd libxml2 leveldb snappy json-c \
htmlcxx icu4c gnutls netcat curl
all
builds all the binaries of the project and testsclean
cleans the buildconfig
brings up the cmake CLI configuration tooldoc
generates the documentation of the projecttest
runs project testshelp
display the complete list of targetsinstall
installsmwsd
,docs2harvest
,mws-config
on your system
To use the Crawler, one needs to start the crawler with the defined website, the count of sites to crawl and optionaly: the start of the crawling and links to be skipped when crawling.
Another way to generate harvests is via docs2harvest
. This takes as
arguments XHTML documents and crawls them, creating harvests files. To
crawl a repository of XHTML documents, use:
find . -name *.xhtml | xargs -n 10 bin/docs2harvest -o /path/to/harvests
The executable mwsd
starts the main MWS server. This takes as argument a
harvest include path which is used to load document data, and a port which
where the data is served via HTTP. It accepts HTTP POST requests with
MWS Queries and returns
MWS Answer Sets.
bin/mwsd -I <harvest include dir> -p <port number>
For additional options, see:
bin/mwsd --help
bin/docs2harvest --help
bin/mws-config help
To setup or remove mwsd
as a global SysV service, use (as root):
mws-config create -p 9090 -i data/zbl zbldemo
mws-config enable zbldemo
This will deploy MathWebSearch to serve the ZBL demo harvests on port 9090. To monitor, start or stop the service, use
service mwsd_zbldemo [start|stop|status|...]
Output is logged to /var/log/mwsd_zbldemo.log
. To serve different harvest
paths, create your own configurations and deploy the service.
This repository contains a Dockerfile for using the MWS Daemon. It can be found as the mathwebsearch/mathwebsearch automated build on Docker Hub and used as follows:
docker run -v /path/to/harvests:/data/ -p 8080:8080 mathwebsearch/mathwebsearch
The image is configured to serve harvests from a /data/
volume on port 8080.
The software in this project (binaries and sources) is released "as is", under the GNU Public License version 3. A copy of this license can be found in the root of this project, under the file name LICENSE.
Most of the code in the core repository was developed by Corneliu-Claudiu Prodescu, under the supervision of Prof. Michael Kohlhase. For a complete list of developers visit https://github.com/KWARC/mws/graphs/contributors
The easiest way to contact the developers is using the MathWebSearch mailing list.