bityoga/fabric_as_code

Docker services are automatically restarted in random periodic intervals - memory maxed out

Closed this issue · 3 comments

Docker services are automatically restarted in random periodic intervals

  • The issue was observed by us and vialog during the testing process.
  • Both of us were using basic droplets from digital ocean 2 VM's, each with 1gb ram and 25gb of SSD
  • We had already fixed the issue of "when services gets restarted it will work as expected" which existed in our 1.4.3's release.
  • So even when the services gets restarted, they should still be working as expected (satisfying the features of 'high availability, ability to auto recover during failures' of time critical DApps).

The issue is discussed in this forum: https://forums.docker.com/t/docker-swarm-periodically-restarts-all-services/69790/5 ,

Possible Reason :

As suggested in the forum , https://forums.docker.com/t/docker-swarm-periodically-restarts-all-services/69790/5
docker randomly restarts services when memory of CPU is maxed out

Test Environment

  • To verify this, we have created 2 machines with 4GB RAM and 80GB SSD in digital ocean to check if the issue persists.
  • Deployed the TIC and BANK APP into those machines.
  • We need to evaluate the stability for some 2 weeks.

Bank App can be accessed here : http://157.230.222.172:3000/
a) Register a user
b) Login and check the balance and transfer functionalities.

The transactions can be seen in hyperledger explorer : http://157.230.222.172:8090/

Updated Observations

  • The docker services got restarted after some two days.
  • But the fabric services was working without any issues.
  • The network remains stable.

Debugging to find the reason for docker service restart

  • When checked docker stats in master machine, CPU utilization for hyperledger_explorer_db container was nearly 190% and memory usage was around 60% even in idle scenarios.

    Screenshot from 2020-10-30 15-16-29

  • docker stats in worker machine, where fabric services were running showed normal behavior with less CPU utilization and memory.

  • The master machines in the docker swarm may have been compromised by some DDoS Trojan attacks as mentioned here: https://admin-ahead.com/forum/server-security-hardening/unix-trojan-ddos_xor-1-chinese-chicken-multiplatform-dos-botnets-trojan/.

  • I checked the processes running on the master machine (top command).

  • A random process with some random name was running in the master machines that consumes maximum cpu utilisation.

    Screenshot from 2020-10-29 16-33-29

  • I am not sure whether that random process is linked with hyperledger_explorer_db container.

  • I guess it may be linked . Because when i searched for the exe for that random process, it pointed to postregres data folder.

    Screenshot from 2020-10-30 15-18-49

  • May be that is why hyperledger_explorer_db container was utilising 150% of CPU when checked through docker stats.

  • But hyperledger_explorer_db is a docker service. It runs in a separate container. That should not affect the host droplet.

Further Observations :

  • When i killed that random process. CPU utilisation became normal and hyperledger_explorer_db container showed 0 to 0.01 % of CPU utilisation in docker stats .

    Screenshot from 2020-10-30 15-20-51

  • But after some random time interval, again some random process with random name was started consuming 150 % CPU utilization and 60% memory.

  • I think ram maxed out issue is caused in random intervals because of this.

To test the behavior :

  • ssh to master machine.
  • run "top" command
  • you will see a process with some random name like "kSyPuNRo" consuming 199.3% of cpu and 60% of mem.
  • Gather that process id (Example : 806854)
  • To check exe that runs that process run:
    - ls -la /proc/806854/exe
    - you can see that it points to postgresql data
  • run : docker stats
  • It can noticed that hyperledger explorer_db container will be using 190% of CPU and 60 % memory
  • now kill that process by running :
    kill -9 806854
  • After some 10 to 15 seconds you can notice cpu utilisation and memory becomes free.
  • if you check docker stats after that , hyperledger_explorer_db container will be consuming mostly 0.01% of CPU.

Conclusion :

  • Not sure if that trojan Ddos virus is generated by:

    • the hyperledger db service
      (or)
    • Droplet is attacked by some hacker.
  • However the worker machines are not effected by that virus.

  • hyperledger_explorer_db service container is always started only in master machine. So that is why it may be affecting only the master machine.

UPDATE

That trojan Ddos virus is mostly generated by:

  • the hyperledger_explorer_db service

Verification method:

If the hyperledger explorer services are removed,

  • that random process disappears
  • CPU and memory utilisation becomes normal

Asked a question in hyperledger explorer chat form : https://chat.hyperledger.org/channel/hyperledger-explorer/thread/Csg6M8BkBkuhDzq5L?jump=Rt6bbcLG9eo7d6t9r