Writes NeXus files from experiment data streamed through Apache Kafka. Part of the ESS data streaming pipeline.
The file-writer package consists of a number of applications:
- kafka-to-nexus - this is the core application used for writing NeXus files from information supplied via Kafka
- template-maker - this application is used for generating a template NeXus file from a JSON template
- file-maker - this application allows NeXus files to be written without needing Kafka. It is mostly for debugging and testing.
All applications are run from the command-line. Run any application with the --help
flag for more details about how to run the application.
The following minimum software is required to get started:
- Linux or MacOS
- Conan < 2.0.0
- CMake >= 3.1.0
- Git
- A C++17 compatible compiler
- Doxygen (only required if you would like to generate the documentation)
Conan is a package manager and, thus, will install all the other required packages.
We have our own Conan repositories for some of the packages required. Follow the README here for instructions on how to include these.
Then run this:
# Assuming profile is named "default"
conan profile update settings.compiler.libcxx=libstdc++11 default
From within the top-most directory:
mkdir _build
cd _build
conan install .. --build=missing
cmake ..
make
There are additional optional CMake flags for adjusting the build:
-DRUN_DOXYGEN=ON
if Doxygen documentation is required. Also, requiresmake docs
to be run afterwards-DHTML_COVERAGE_REPORT=ON
to generate a html unit test coverage report, output to<BUILD_DIR>/coverage/index.html
We have three levels of tests:
- unit tests: typically low-level tests
- domain tests: mid-level tests that exercise the codebase but without Kafka
- integration tests: high level tests that confirm that the basic function of the application including Kafka
From the build directory:
make UnitTests
./bin/UnitTests
Note: some of the tests may fail on MacOS but as Linux is our production system this doesn't matter. Instead the tests can be run on the build server - if they fail there then there is definitely a problem!
These tests run the core functionality of the program but without Kafka. This means the tests are more stable than the integration tests as they do not require Docker and Kafka to be running.
The domain tests are implemented in Python but use the file-maker to generate a file.
Inside the domain tests folder there is a requirements.txt
file; these requirements need to be install before running the domain tests.
pip install -r requirements.txt
The tests can be run like so:
$ cd domain-tests
$ pytest --file-maker-binary=<path to file-maker binary>
The "data" for writing is specified via JSON; for examples see domain-tests/data_files
.
Correspondingly, there is a NeXus template file that defines the layout of the produced file.
See domain-tests/nexus_templates
for examples.
The tests work by first generating a NeXus file using the file-maker and then run tests against the file to check it is correct.
The integration tests consist of a series of automated tests for this repository that test it in ways similar to how it would be used in production.
See Integration Tests page for more information.
The file-writer can be configured from a file via --config-file <ini>
which mirrors the command line options (use flag --help
to see them).
There are many options but most have sensible defaults. The minimum ones that need setting are:
brokers=localhost:9092
command-status-topic=local_filewriter_status
job-pool-topic=local_filewriter_pool
hdf-output-prefix=output
- brokers is the addresses of the Kafka brokers comma-separated.
- command-status-topic is where the file-writer sends status messages when it is NOT writing.
- job-pool-topic is the topic where the file-writer receives request for writing files. This mechanism is explained below.
- hdf-output-prefix defines the path where the files should be written.
The file-writer is designed to run in a system where multiple other file-writers are present. When a request to start writing a file is sent to the job pool one of the file-writers will pick it up. That file-writer will switch to a new topic specified via start message. It will then send a message indicating that it has started writing to that topic. The request to stop writing will also come to that topic. Once the file-writer has finished writing it will disconnect from the topic and return to the job pool topic.
Note: all file-writers belong to the same consumer group, so only one will consume the start message.
This flow-chart shows this information pictorially.
For more information on the commands send to and from the file-writer see commands.
To reduce the size of the start messages sent to Kafka, it is possible to use pre-created template NeXus files. These usually contain the static (rarely changed) information for a particular instruments, e.g. the geometry of the instrument. When file-writing starts, the template file is copied and then populated with the dynamic data. Each instrument has its own unique template.
The source of truth for the static and dynamic configurations is a large JSON file. The template-maker generates the configurations from this file and then the templates are deployed for use.
If there is a change to the JSON then the templates need to be recreated and redeployed.
Use the --help
flag to see how to use it.
Similar to the template-maker, the file-maker allows creating a NeXus file without Kafka. The data and file configuration are read from files.
Its main purpose is for allowing developers to do testing and debugging in a more deterministic way by having consistent input data and not having to set up Kafka.
The file-maker has a fixed start and stop time (10000ms to 15000ms Unix epoch), so the data supplied needs to take that into account. Note: times in the data file is in ms, the file-maker will convert them to the correct units automatically.
Use the --help
flag to see how to use it. The domain-tests also use it, so they are a good way to learn more about it.
See the documentation
directory.
Please read CONTRIBUTING.md for information on submitting code to this project.
This project is licensed under the BSD 2-Clause "Simplified" License - see the LICENSE.md file for details.