A Neural Network Trainer, used to train NNUE-style networks for akimbo.
Also used by a number of other engines, including:
Used exclusively to train architectures of the form Input -> Nx2 -> Output
.
Can use either a CPU or hand-written CUDA backend.
Currently Supported Games:
- Chess
- Ataxx
Raise an issue for support of a new game.
To learn how it works, read the wiki.
The trainer uses its own binary data format for each game.
The specifications for the data formats are found in the bulletformat crate.
Additionally, each type implements from_raw
which is recommended for use if your engine is written in Rust (or you don't
mind FFI).
All data types at present are 32 bytes, so you can use marlinflow-utils to shuffle and interleave files.
You can convert text format, where
- each line is of the form
<FEN> | <score> | <result>
FEN
has 'x'/'r', 'o'/'b' and '-' for red, blue and gaps/blockers, respectively, in the same format as FEN for chessscore
is red relative and an integerresult
is red relative and of the form1.0
for win,0.5
for draw,0.0
for loss
by using the command
cargo r -r --bin convertataxx <input file path> <output file path>
You can convert a Marlinformat file by running
cargo r -r --bin convertmf <input file path> <output file path> <threads>
it is up to the user to provide a valid Marlinformat file.
Additionally, you can convert legacy text format as in Marlinflow, where
- each line is of the form
<FEN> | <score> | <result>
score
is white relative and in centipawnsresult
is white relative and of the form1.0
for win,0.5
for draw,0.0
for loss
by using the command
cargo r -r --bin convert <input file path> <output file path>
By default all trained nets are quantised with the QA
and QB
factors found in common/src/lib.rs
.
However, if you have a params.bin
file from a checkpoint folder then you can quantise this however you want with
cargo r -r --bin quantise <input file path> <output file path> <QA> <QB>
It is recommended to change QA
to 181 or less if using SCReLU activation, as then you can utilise manual SIMD to
achieve a significant speedup.
General architecture settings, that must be known at compile time, are found in common/src/lib.rs
.
It is like this because of Rust's limitations when it comes to const code.
After settings those as you please, you can run the trainer using the run.py
script, and use
python3 run.py --help
to get a full description of all options.
A sample usage is
python3 run.py \
--data-path data.bin \
--test-id net \
--threads 6 \
--lr 0.001 \
--wdl 0.5 \
--max-epochs 40 \
--batch-size 16384 \
--save-rate 10 \
--lr-step 15 \
--lr-gamma 0.1
of these options, only data-path
, threads
and lr-step
are not default values.
NOTE: You may need to run cargo update
if you pull a newer version from main
.
There are 3 separate learning rate options:
lr-step N
drops the learning rate everyN
epochs by a factor oflr-gamma
lr-drop N
drops the learning rate once, atN
epochs, by a factor oflr-gamma
lr-end x
is exponential LR, starting atlr
and ending atx
when atmax-epochs
, it is equivalent tolr-step 1
with an appropriatelr-gamma
.
By default lr-gamma
is set to 0.1, but no learning rate scheduler is chosen. It is highly
recommended to have at least one learning rate drop during training.
Add --cuda
to use CUDA, it will fail to compile if not available.
Note that for small net sizes (generally unbucketed & hidden layer size < 256),
CUDA may be slower than using the CPU.
Currently (at the time of writing) rustc does not emit avx512 via autovec, so if you have an avx512 cpu, switch to the nightly
Rust channel and add the --simd
flag to the run command to enable usage of hand-written SIMD.
This comes with the caveat that hidden layer size must be a multiple of 32.
As rust nightly is unstable and has a bunch of experimental compiler changes, there may be an overall diminished performance compared to compiling on stable, so I'd recommend testing the two on your machine.
Every save-rate
epochs and at the end of training, a quantised network is saved to /nets
, and a checkpoint
is saved to /checkpoints
(which contains the raw network params, if you want them). You can "resume" from a checkpoint by
adding --resume checkpoints/<name of checkpoint folder>
to the run command.
This is designed such that if you use an identical command with the resuming appended, it would be as if you never stopped training, so if using a different command be wary that it will try to resume from the epoch number the checkpoint was saved at, meaning it will fast-forward Learning Rate to that epoch.