Getting started -- .binpack files missing?
TonyGuil opened this issue · 2 comments
I am trying to run easy-train.py using easy_train_example.bat. But it expects the three .binpack files to be available somehwhere:
--training-dataset=c:/dev/nnue-pytorch/noob_master_leaf_static_d12_85M_0.binpack ^
--training-dataset=c:/dev/nnue-pytorch/d8_100000.binpack ^
--training-dataset=c:/dev/nnue-pytorch/10m_d3_2.binpack ^
Where can I get these files? And how were they generated?
For background: My ultimate aim is to implement NNUE in my Onitama program. This is perhaps overkill -- as far as I know, my program might already be the best Onitama player in the world -- but I want to find out how NNUE works.
I dunno much about NNUE but I know that linrock describes his training process, data collection and filtering, and a link to his data in his Stockfish commits, e.g. the most recent one: official-stockfish/Stockfish@1b7dea3
And for older such commits: https://github.com/official-stockfish/Stockfish/commits/master/?author=linrock (search for "update default" nets, main nets/small nets etc)
The datasets from the example are normally not used, they are just some old small ones that used to be common. Some datasets are linked to from the wiki https://github.com/official-stockfish/nnue-pytorch/wiki/Training-datasets#good-datasets, for others you need to look at commits that introduce new networks from linrock, as the wiki is incomplete at this point
edit. this .sh script is very close to the first training stage for master I think, I used it some time ago when trying to replicate
note that these will not run out of the box, you need to understand every setting here and whether it needs modification to local environment
python3.11 easy_train.py \
--training-dataset=/data/sopel/nnue/nnue-pytorch-training/data/nodes5000pv2_UHO.binpack \
--training-dataset=/data/sopel/nnue/nnue-pytorch-training/data/dfrc_n5000.binpack \
--num-workers=8 \
--threads=2 \
--gpus="0,1" \
--runs-per-gpu=1 \
--batch-size=16384 \
--max_epoch=600 \
--do-network-training=True \
--do-network-testing=True \
--tui=True \
--network-save-period=20 \
--random-fen-skipping=3 \
--start-lambda=1.0 \
--end-lambda=1.0 \
--fail-on-experiment-exists=True \
--build-engine-arch=x86-64-bmi2 \
--build-threads=32 \
--epoch-size=100000000 \
--validation-size=1000000 \
--network-testing-threads=24 \
--network-testing-explore-factor=1.5 \
--network-testing-book="https://github.com/official-stockfish/books/blob/master/UHO_XXL_%2B0.90_%2B1.19.epd.zip" \
--network-testing-nodes-per-move=20000 \
--network-testing-hash-mb=8 \
--network-testing-games-per-round=200 \
--engine-base-branch=Sopel97/Stockfish/experiment_502 \
--engine-test-branch=Sopel97/Stockfish/experiment_502 \
--nnue-pytorch-branch=Sopel97/nnue-pytorch/experiment_502 \
--workspace-path=./easy_train_data \
--experiment-name=502_s1 \
--features="HalfKAv2_hm^"
and this is how you'd run a retraining session
python3.11 easy_train.py \
--training-dataset=/data/sopel/nnue/nnue-pytorch-training/data/T60T70wIsRightFarseerT60T74T75T76.binpack \
--num-workers=16 \
--threads=2 \
--gpus="0,1" \
--runs-per-gpu=1 \
--start-from-experiment=502_s1 \
--batch-size=16384 \
--max_epoch=600 \
--do-network-training=True \
--do-network-testing=True \
--tui=True \
--network-save-period=20 \
--random-fen-skipping=10 \
--start-lambda=1.0 \
--end-lambda=0.75 \
--fail-on-experiment-exists=True \
--build-engine-arch=x86-64-bmi2 \
--build-threads=32 \
--epoch-size=100000000 \
--validation-size=1000000 \
--network-testing-threads=24 \
--network-testing-explore-factor=1.5 \
--network-testing-book="https://github.com/official-stockfish/books/blob/master/UHO_XXL_%2B0.90_%2B1.19.epd.zip" \
--network-testing-nodes-per-move=20000 \
--network-testing-hash-mb=8 \
--network-testing-games-per-round=200 \
--engine-base-branch=Sopel97/Stockfish/experiment_502 \
--engine-test-branch=Sopel97/Stockfish/experiment_502 \
--nnue-pytorch-branch=Sopel97/nnue-pytorch/experiment_502 \
--workspace-path=./easy_train_data \
--experiment-name=502_s2 \
--features="HalfKAv2_hm^"