okx/exchain

Error with docker image

Opened this issue · 13 comments

Error when deploying full node with docker image

1. Describe

The following error occursConnecting to raw.githubusercontent.com (185.199.111.133:443) wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory /root/start.sh: line 11: 10 Illegal instruction (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb

docker parameters
docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

same here, not able to run docker image

docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

```shell
docker run -d --name exchain-mainnet-fullnode -v ~/.exchaind:/root/.exchaind -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

make sure you have config and data directories in your exchaind directory

I have tried initialising both directories, populating genesis, and priv_validator_state.json, but it still crashes with Illegal instruction (core dumped)

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

@neoromantique

Can you redo the deployment following this thread? If there is an error in the deployment please tell me which step went wrong and what is the specific error

https://forum.okt.club/d/299-how-to-start-a-mainnet-node

I cannot even execute exchaind init from within docker. And building it for my host defeats the point of docker image in the first place (And I think wouldn't help anyway).

@neoromantique
try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

@neoromantique try this

  1. mkdir ~/okc
  2. cd ~/okc
  3. curl -O https://okg-pub-hk.oss-cn-hongkong.aliyuncs.com/cdn/oec/snapshot/mainnet-s0-20221018-14723313-rocksdb.tar.gz
  4. tar zxvf mainnet-s0-20221018-14723313-rocksdb.tar.gz
  5. docker run -d --name exchain-mainnet-fullnode -v ~/okc/data:/root/.exchaind/data/ -p 8545:8545 -p 26656:26656 okexchain/fullnode-mainnet:latest

Same exact output.

root@hostname ~/okc # ls -la
total 31175916
drwxr-xr-x 3 root root          81 Oct 19 20:20 .
drwx------ 8 root root         269 Oct 19 16:02 ..
drwx------ 7 root root         161 Oct 17 20:33 data
-rw-r--r-- 1 root root 31924135195 Oct 19 17:19 mainnet-s0-20221018-14723313-rocksdb.tar.gz
root@ hostname ~/okc # docker logs --tail 100 -f 25d
/root/start.sh: line 6:     7 Illegal instruction     (core dumped) exchaind init fullnode --chain-id exchain-66
Connecting to raw.githubusercontent.com (185.199.109.133:443)
wget: can't open '/root/.exchaind/config/genesis.json': No such file or directory
/root/start.sh: line 11:    10 Illegal instruction     (core dumped) exchaind start --chain-id exchain-66 --rest.laddr tcp://0.0.0.0:8545 --db_backend rocksdb
root@hostname ~/okc # 

@neoromantique
https://stackoverflow.com/questions/54698812/illegal-instruction-core-dumped-when-trying-to-execute-elf-file

It means the compiled binary contains an instruction(possibly more than one instruction) that's not valid on the architecture where you're running it.

Based on this post and other related posts on stackoverflow, I'm guessing it might be a hardware issue.

You can run your binary under gdb to find out specific instruction:
gdb ./precompiled
(gdb) run
(gdb) bt
(gdb) disassemble
Then type run and then when it fails, run bt (backtrace) to see where it fails. Use disassemble to see the specific instruction that's causing the failure.

Can you try this or try running okc on another machine?

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

I'm running it on AMD Ryzen 9 5950X, it's fairly standard and modern hardware.

https://gist.github.com/neoromantique/ab52f80e31a4a4df70bd0b744f870275

@neoromantique

Program received signal SIGILL, Illegal instruction.
0x0000000001dedb56 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> const*> (this=0x453b760 <rocksdb::(anonymous namespace)::sc_wrapper_type_info>, __f=0x7fffffffe460, __l=0x7fffffffe4f8, __bkt_count_hint=0, __h1=..., __h2=..., 
    __h=..., __eq=..., __exk=..., __a=...) at /usr/include/c++/10.3.1/bits/stl_iterator_base_funcs.h:138
   0x0000000001dedb49 <+121>:   movq   $0x0,0x10(%rdi)
   0x0000000001dedb51 <+129>:   vmovq  %rax,%xmm0
=> 0x0000000001dedb56 <+134>:   vpmaxuq %xmm1,%xmm0,%xmm0
   0x0000000001dedb5c <+140>:   vmovq  %xmm0,%rsi

We can see that the instruction causing the error is vpmaxuq.

https://www.officedaytime.com/simd512e/
https://en.wikipedia.org/wiki/AVX-512

It looks like vpmax is an AVX512 instruction, and Ryzen doesn't support it.

https://www.quora.com/Does-Ryzen-support-AVX

The error comes from rocksdb, I think we can try by recompiling rocksdb on your machine.

  1. cd ~
  2. git clone -b v1.6.3 https://github.com/okex/exchain.git
  3. cd exchain
  4. make rocksdb
  5. make mainnet
  6. exchaind init okc-mainnet-node --chain-id exchain-66 --home ~/.exchaind

If an error occurs in the step of make rocksdb, please compile rocksdb with version 6.27.3 according to the official documents.
https://github.com/facebook/rocksdb

@neoromantique Has your problem been resolved?

@neoromantique Has your problem been resolved?

Well, kinda.
I've used my own Dockerfile based on Ubuntu to build the rocksdb and exchain, after that it works fine, even with rocksdb.

I also had this issue.
In my case it was rocksdb linked to libstdc++-dev missing on my docker

docker run --rm -ti --platform="linux/x86_64" --privileged okexchain/fullnode-mainnet sh

okexchain:/go/bin# apk add gdb
OK: 208 MiB in 110 packages

okexchain:/go/bin# mkdir -p /root/.config/gdb/

okexchain:/go/bin# echo "set auto-load safe-path /" > /root/.config/gdb/gdbinit

okexchain:/go/bin# gdb exchaincli 
GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-alpine-linux-musl".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from exchaincli...
Loading Go Runtime support.

(gdb) run
Starting program: /go/bin/exchaincli 

Program received signal SIGILL, Illegal instruction.
0x0000000001817bc6 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_Hashtable<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, rocksdb::OptionTypeInfo> const*> (this=0x38a6b40 <rocksdb::(anonymous namespace)::sc_wrapper_type_info>, __f=0x7fffffffea30, __l=0x7fffffffeac8, __bkt_count_hint=0, __h1=..., __h2=..., __h=..., __eq=..., __exk=..., __a=...) at /usr/include/c++/10.3.1/bits/hashtable.h:1058
1058	/usr/include/c++/10.3.1/bits/hashtable.h: No such file or directory.

(gdb) exit