status-im/infra-nimbus

Windows host configuration for Holesky testnet

jakubgs opened this issue · 18 comments

Since Holesky is the next long-running testnet and Prater is essentially dead it's time to deploy Beacon Nodes on Windows.

Host was re-purposed from Prater testnet and reinstalled by Innova support:
https://client.innovahosting.net/viewticket.php?tid=047561&c=VMuZGsvh

Initially there were build issues on Windows Server 2019 Essentials:

But those were not reproducible on Windows Server 2019 Standard, instead a segmentation fault was identified:

But that was mitigated by downgrading GCC from 13.2.0 to 11.2.0.

Based on recommendation from @cheatfate I tried using a ReFS partition instead of NTFS to improve performance:

Which should be available as a format option since Windows Server 2012:

image

I had to install HPE Smart Storage Administrator (HPE SSA) to manage the logical volumes and was able to use to 1.6 TB SSDs to create a RAID0 volume:

image

And indeed that new logical volume could be formatted as ReFS:

image

Used 4kb Allocation Unit size based on small page size of SQLite:

Since the SQLite database file format was designed (in 2003) the default page size for new databases has been 1024 bytes. This was a reasonable choice in 2003. But on modern hardware, a 4096 byte page is a faster and better choice. So, beginning with SQLite version 3.12.0 (2016-03-29)) the default page size for new database files has been increased to 4096 bytes.

https://www.sqlite.org/pgszchng2016.html

There is definitely a performance difference, though some of it is definitely due to use of RAID0:

C:\ (800 GB LV) D:\ (3 TB RAID0 LV)
image image

Not the greatest comparison, but ReFS definitely has an impact.

Here's the changes:

Use of ReFS volume probably forced approach to setting permissions of data folder, since it has to be very specific or you get:

{"lvl":"FAT","ts":"2024-03-26 08:49:01.793-07:00","msg":"Data folder has insecure ACL","path":"D:\\beacon-node-holesky-stable\\data"}

Actually, the old build failures I described here:

Started appearsing for unstable node:

which gcc &>/dev/null || { echo "C compiler (gcc) not installed. Aborting."; exit 1; }
gcc -Os -Wall -W -Wstrict-prototypes -DNDEBUG -D_WIN32_WINNT=0x501 -Iinclude -I. -o wingenminiupnpcstrings.exe wingenminiupnpcstri
ngs.c
gcc -Os -Wall -W -Wstrict-prototypes -DNDEBUG -D_WIN32_WINNT=0x501 -Iinclude -I. -DMINIUPNP_STATICLIB -c -o minixml.o src/minixml.
c
gcc -Os -Wall -W -Wstrict-prototypes -DNDEBUG -D_WIN32_WINNT=0x501 -Iinclude -I. -DMINIUPNP_STATICLIB -c -o igd_desc_parse.o src/i
gd_desc_parse.c
gcc: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory
compilation terminated.
make[1]: *** [Makefile.mingw:121: wingenminiupnpcstrings.exe] Error 1
make[1]: *** Waiting for unfinished jobs....
gcc: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory
compilation terminated.
make[1]: *** [Makefile.mingw:101: minixml.o] Error 1
gcc: fatal error: cannot execute 'cc1': CreateProcess: No such file or directory
compilation terminated.
make[1]: *** [Makefile.mingw:101: igd_desc_parse.o] Error 1
make[1]: Leaving directory 'D:/beacon-node-holesky-unstable/repo/vendor/nim-nat-traversal/vendor/miniupnp/miniupnpc'
make: *** [vendor/nimbus-build-system/makefiles/targets.mk:134: libminiupnpc.a] Error 2

I managed to fix it by copying these files from stable repo that built fine:

nimbus@windows-01 MINGW64 .../miniupnp/miniupnpc ((97d928b...))
$ ls -l wingenminiupnpcstrings.exe
-rwxr-xr-x 1 nimbus 197121 236465 Mar 25 12:23 wingenminiupnpcstrings.exe*

Interestingly, after I cleaned up miniupnpc:

nimbus@windows-01 MINGW64 .../miniupnp/miniupnpc ((f5d0e49...))
$ g clean -fdx
Removing .Makefile.mingw.swp
Removing addr_is_reserved.o
Removing connecthostport.o
Removing igd_desc_parse.o
Removing minisoap.o
Removing minissdpc.o
Removing miniupnpc.o
Removing miniupnpcstrings.h
Removing miniwget.o
Removing minixml.o
Removing portlistingparse.o
Removing rc_version.h
Removing receivedata.o
Removing upnpcommands.o
Removing upnpdev.o
Removing upnperrors.o
Removing upnpreplyparse.o

It started failing again, so i removed the PATH=".:${PATH}" part from makefiles/targets.mk and it worked:

nimbus@windows-01 MINGW64 /d/beacon-node-holesky-libp2p/repo (nim-libp2p-auto-bump-unstable)
$ make libminiupnpc.a --debug
Reading makefiles...
Updating makefiles....
Updating goal targets....
 File 'libminiupnpc.a' does not exist.
   File 'sanity-checks' does not exist.
  Must remake target 'sanity-checks'.
  Successfully remade target file 'sanity-checks'.
Must remake target 'libminiupnpc.a'.
Successfully remade target file 'libminiupnpc.a'.

No idea why it's necessary at all.

But currently libp2p build fails with:

D:\beacon-node-holesky-libp2p\repo\beacon_chain\networking\eth2_network.nim(2305, 42) template/generic instantiation of `topic` from here
D:\beacon-node-holesky-libp2p\repo\beacon_chain\networking\eth2_network.nim(2299, 11) Error: undeclared field: 'topicIds' for type messages.Message [type declared in D:\beacon-node-holesky-libp2p\repo\vendor\nim-libp2p\libp2p\protocols\pubsub\rpc\messages.nim(40, 3)]                                                            
make: *** [Makefile:448: nimbus_beacon_node] Error 1

I can reproduce this failure on Linux tho so it doesn't seem Windows related.

But at least stable/testing/unstable running:

image

Still seeing a symlinking permissions issue:

Build completed successfully: build/nimbus_beacon_node                                                                                                         
 >>> Install binaries...                                                                                                                                       
ln: failed to create symbolic link 'D:\beacon-node-holesky-unstable\bin/nimbus_beacon_node.exe': Permission denie

Fixed bin folder permissions:

All nodes are up:

image

I consider this done.

Forgot to purchase a license. This looks like the best option: Standard for 8 cores for 70 CHF:
https://thekeystore.ch/en/products/microsoft-windows-server-2019-standard-8-core

Done:

image

But when I tried to use the key i got:

image

Gonna contact their support.

Got a replacement key, we're back in business:

image