/alexandria

Full text search engine powering Alexandria.org - the open search engine.

Primary LanguageC++OtherNOASSERTION

Alexandria.org

Documentation

  1. Index file format (.fti)
  2. Search Result Ranking
  3. API Response format
  4. Caching
  5. Installing nodes

Build with docker

  1. Build docker image
docker build . -t alexandria
  1. Run container
docker container run --name alexandria -v $PWD:/alexandria -it -d alexandria
  1. Attach to container.
docker exec -it alexandria /bin/bash
  1. Initialize docker
/alexandria/scripts/init-docker.sh
  1. Download and build dependencies.
/alexandria/scripts/download-deps.sh
/alexandria/scripts/build-deps.sh
  1. Configure with cmake and build tests.
mkdir /alexandria/build
cd /alexandria/build

cmake .. -DCMAKE_BUILD_TYPE=Debug
or
cmake .. -DCMAKE_BUILD_TYPE=Release

make -j4 run_tests

How to build manually

  1. Configure the system (Tested on Ubuntu 20.04)
# Will alter your system and install dependencies with apt.
./scripts/install-deps.sh

# Will download and build zlib, aws-lambda-cpp and aws-sdk-cpp will only alter the local directory.
./scripts/build-deps.sh
  1. Build with cmake
mkdir build
cd build

cmake .. -DCMAKE_BUILD_TYPE=Debug
or
cmake .. -DCMAKE_BUILD_TYPE=Release

make -j24
  1. Download test data to local server. To run the test suite you need to install nginx and pre-download all the data: Configure local nginx test data server

  2. Create output directories. Note, this will create a bunch of directories in the /mnt so make sure you don't have anything there.

./scripts/prepare-output-dirs.sh
  1. Run the test suite
cd build
make run_tests -j24
./run_tests

Coding rules

  1. Never put "using namespace..." in header files.
  2. Namspaces and Classes written by us should be CamelCase
  3. Everything else should be lower_case
  4. All files within a sub directory must contain namespace that is the same as the directory. For example src/file/TsvFile.h must declare everything within the namespace File.

Notes

On nodes with spinning disks we should turn off energy saving:

hdparm -B 255 /dev/sda

Debugging notes

Debugging scraper with gdb:

By default, gdb captures SIGPIPE of a process and pauses it. However, some program ignores SIGPIPE. So, the default behavour of gdb is not desired when debugging those program. To avoid gdb stopping in SIGPIPE, use the folloing command in gdb: handle SIGPIPE nostop noprint pass