The background: I use MediaWiki as my personal wiki software for more than 10 years as of now. The wiki contains years of diaries and various hand-written notes, totaling more than 6000 pages. I kept the wiki with the intention of long-term use, thus I choose SQLite as database and minimize the set of extensions. The goal is to keep the wiki easily maintained, easily backed-up, and easily upgradable to the latest MediaWiki software.
The problem: The naive built-in full-text search is not very powerful. The suggested option is to incorporate a search extension based on ElasticSearch. However, the addition of such a weighty software in my personal wiki is unacceptable.
The solution: I cobbled up this piece of search software - wiki-search to run independently alongside MediaWiki and index its pages.
Wiki-search is built upon the following wonderful libraries:
This software reads MediaWiki’s SQLite database directly. This allows me to retrieve all pages in one SQL query, taking advantage of existing DB indexes. The result is significantly faster than requesting through API.
In practice, re-indexing my personal wiki fully (~6400 pages) only takes around 8-15 seconds, which is acceptable for me.
After starting the server, an automatic reindexing thread will spawn in the background and triggers reindexing every hour. If no update is made to the wiki’s database, the reindexing will be skipped.
If you need more up-to-date search results, you can manually trigger reindexing by clicking the “Reindex” button on the Web UI.
If you do not need automatic reindexing, you can also disable it by setting the environment variable AUTO_REINDEX
to false
.
A large portion of my wiki are diary entries. Therefore, wiki-search supports searching by date ranges.
It determines date-included entries by recognizing dates in page title. This design is because my diary entries may be updated and migrated on a later time, which may not reflect its real date (not creation/modification date).
It supports many date formats, including those vaguely resembling dates, e.g. “2023”, “2023-01”.
Wiki-search takes advantage of the tantivy library to provide rich search syntax.
As a result, you can query by any of the following fields (FIELD:QUERY
):
- title
- text
- updated
- title_date
- namespace
- category
The query may also supports logical combinator (AND
, OR
), exclusion (NOT
, -
), “must include” (+
), boosting (TERM^2.0
).
The main interface I designed for this software is a Web UI. But you can also invoke it by API.
If you don’t like the software running in server mode, you can also use the fully-contained command line for reindexing and query.
At least 1000 entries in my wiki are written in Chinese, and many entries also include Japanese. So wiki-search was designed to support CJK well from the beginning.
It also uses English stemming, so you don’t need to type in the exact word forms to make a search.
You can build the software by running:
make
The static binaries can be found in the target
directory.
Usage information can be retrieved by running:
wiki-search --help
To build the docker image, you can run:
make push-docker IMAGE_NAME=docker.io/USER/wiki-search
The image will be built and pushed to the target you specified.
Here’s the portion of a sample deployment.
- name: search image: docker.io/USER/wiki-search:latest ports: - containerPort: 404 name: wiki-search env: - name: SQLITE_PATH value: /data/my_wiki.sqlite - name: WIKI_BASE value: https://YOUR_WIKI_BASE/index.php/ - name: INDEX_DIR value: /index - name: BIND_ADDR value: 0.0.0.0:404 volumeMounts: - name: wiki subPath: data readOnly: true mountPath: /data - name: wiki-search-index mountPath: /index