The source for Nutrimatic

Build from source (the easy scripted way)

(If this doesn't work for you, see the manual steps below.)

You'll need a working C++ build system (debian/ubuntu: sudo apt install build-essential)
Install mise-en-place as a tool installer: curl https://mise.run | sh (or see other install methods)
Run ./dev_setup.py which will install various dependencies locally
Then run conan build . which will leave binaries in build/

Build from source (the hard manual way)

(The scripted path above is easier! But maybe that's too magical, or you don't like mise...)

As above, you'll need C++ build tools (debian/ubuntu: sudo apt install build-essential)
Use Python 3.10 (avoids this wikiextractor bug exposed by this change in Python 3.11)
You probably want to set up a Python venv
Install Conan, CMake, etc: pip install -r dev_requirements.txt
Configure Conan to build on your machine (if you haven't already)
```
conan profile detect
conan profile path default  # note the path this outputs
```
Edit the file listed by conan profile path default to set compiler.cppstd=17 (or gnu17)
Install C++ dependencies: conan install . --build=missing
Then run conan build . which will leave binaries in build/

To build an index

To actually use Nutrimatic, you will need to build an index from Wikipedia.

Download the latest Wikipedia database dump (this is a ~20GB file!):
```
wget https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
```
(You can also look for a mirror closer to you.)
Extract the text from the articles using Wikipedia Extractor (this generates ~12GB, and takes hours!):
```
pip install wikiextractor  # installs into the local virtualenv
wikiextractor enwiki-latest-pages-articles.xml.bz2
```
(There are probably better extractors these days!)

This will write many files named text/??/wiki_??.
Index the text (this generates ~100GB of data, and also takes hours!):
```
find text -type f | xargs cat | build/make-index wikipedia
```
This will write many files named wikipedia.?????.index. (You can break this up by running make-index with different chunks of input data, replacing "wikipedia" with unique names each time.)
Merge the indexes; I normally do this in two stages:
```
for x in 0 1 2 3 4 5 6 7 8 9
do build/merge-indexes 2 wikipedia.????$x.index wiki-merged.$x.index
done
```
followed by
```
build/merge-indexes 5 wiki-merged.*.index wiki-merged.index
```
There's nothing magical about this 10-batch approach, you can use any strategy you like. The 2 and 5 numbers are phrase frequency cutoffs (how many times a string must occur to be included).

Enjoy your new index:

build/find-expr wiki-merged.index '<aciimnrttu>'

Serving the web interface

If you want to run the nutrimatic.org style interface, point a web server at the web_static/ directory, and for root requests have it launch cgi_scripts/cgi-search.py with $NUTRIMATIC_FIND_EXPR set to the find-expr binary and $NUTRIMATIC_INDEX set to the index you built.

(You might want to use install_to_dir.py which will copy executables, CGI scripts, and static content to the directory of your choice.)

For example, you could adapt this nginx config:

location /my-nutrimatic/ {
  # Serve static files (change /home/me/nutrimatic_install to your install dir)
  alias /home/me/nutrimatic_install/web_static/;

  # For root requests, run the CGI script
  location = /my-nutrimatic/ {
    fastcgi_pass unix:/var/run/fcgiwrap.socket;
    fastcgi_buffering off;  # send results as soon as we find them
    include /etc/nginx/fastcgi_params;
    gzip off;  # gzip compression also causes buffering

    # (change /home/me/nutrimatic_install to your install dir)
    fastcgi_param SCRIPT_FILENAME /home/me/nutrimatic_install/cgi_scripts/cgi-search.py;
    fastcgi_param NUTRIMATIC_FIND_EXPR /home/me/nutrimatic_install/bin/find-expr;

    # (change to wherever you put your index file)
    fastcgi_param NUTRIMATIC_INDEX /home/me/nutrimatic_install/wiki-merged.index;
  }
}

Have fun,

-- egnor@ofb.net

Older Versions

If you need the version of Nutrimatic website as it was served historically, you will need to rebuild it using the instructions above. For that you need the codebase and website from that time, as well as the Data dump for Wikipedia from the right month (Link to all historic data dumps).