Docker services are automatically restarted in random periodic intervals - memory maxed out

Question

Closed this issue 3 years ago · 3 comments

Docker services are automatically restarted in random periodic intervals

The issue was observed by us and vialog during the testing process.
Both of us were using basic droplets from digital ocean 2 VM's, each with 1gb ram and 25gb of SSD
We had already fixed the issue of "when services gets restarted it will work as expected" which existed in our 1.4.3's release.
So even when the services gets restarted, they should still be working as expected (satisfying the features of 'high availability, ability to auto recover during failures' of time critical DApps).

Possible Reason :

As suggested in the forum , https://forums.docker.com/t/docker-swarm-periodically-restarts-all-services/69790/5
docker randomly restarts services when memory of CPU is maxed out

Answer 1 · 2020-10-30T14:08:01.000Z

Test Environment

To verify this, we have created 2 machines with 4GB RAM and 80GB SSD in digital ocean to check if the issue persists.
Deployed the TIC and BANK APP into those machines.
We need to evaluate the stability for some 2 weeks.

Bank App can be accessed here : http://157.230.222.172:3000/
a) Register a user
b) Login and check the balance and transfer functionalities.

The transactions can be seen in hyperledger explorer : http://157.230.222.172:8090/

Answer 2 · 2020-10-30T14:28:43.000Z

Updated Observations

Debugging to find the reason for docker service restart

When checked docker stats in master machine, CPU utilization for hyperledger_explorer_db container was nearly 190% and memory usage was around 60% even in idle scenarios.
docker stats in worker machine, where fabric services were running showed normal behavior with less CPU utilization and memory.
The master machines in the docker swarm may have been compromised by some DDoS Trojan attacks as mentioned here: https://admin-ahead.com/forum/server-security-hardening/unix-trojan-ddos_xor-1-chinese-chicken-multiplatform-dos-botnets-trojan/.
I checked the processes running on the master machine (top command).
A random process with some random name was running in the master machines that consumes maximum cpu utilisation.
I am not sure whether that random process is linked with hyperledger_explorer_db container.
I guess it may be linked . Because when i searched for the exe for that random process, it pointed to postregres data folder.
May be that is why hyperledger_explorer_db container was utilising 150% of CPU when checked through docker stats.
But hyperledger_explorer_db is a docker service. It runs in a separate container. That should not affect the host droplet.

Further Observations :

When i killed that random process. CPU utilisation became normal and hyperledger_explorer_db container showed 0 to 0.01 % of CPU utilisation in docker stats .
But after some random time interval, again some random process with random name was started consuming 150 % CPU utilization and 60% memory.
I think ram maxed out issue is caused in random intervals because of this.

To test the behavior :

ssh to master machine.
run "top" command
you will see a process with some random name like "kSyPuNRo" consuming 199.3% of cpu and 60% of mem.
Gather that process id (Example : 806854)
To check exe that runs that process run:
- ls -la /proc/806854/exe
- you can see that it points to postgresql data
run : docker stats
It can noticed that hyperledger explorer_db container will be using 190% of CPU and 60 % memory
now kill that process by running :
kill -9 806854
After some 10 to 15 seconds you can notice cpu utilisation and memory becomes free.
if you check docker stats after that , hyperledger_explorer_db container will be consuming mostly 0.01% of CPU.

Conclusion :

Not sure if that trojan Ddos virus is generated by:
- the hyperledger db service
  (or)
- Droplet is attacked by some hacker.
However the worker machines are not effected by that virus.
hyperledger_explorer_db service container is always started only in master machine. So that is why it may be affecting only the master machine.

Answer 3 · 2020-11-02T12:02:17.000Z

UPDATE

That trojan Ddos virus is mostly generated by:

Verification method:

If the hyperledger explorer services are removed,