MoneroOcean/xmr-node-proxy

Disk Read/Write Blow Up and Proxy Stuck

BlankerL opened this issue · 14 comments

Hello,

I recently switched to MoneroOcean and xmr-node-proxy, and found that the disk I/O will be high and occupying all CPU for no reason, and the proxy will just not responding to the requests.
image

I only run this service on the Azure B1s server, and it works quite well with xmrig-proxy, so I have no idea if there is anything wrong with the settings?

Here are some log file might help.
image

The contents basically are: Miner socket error from *.*.*.*: Error: write EPIPE
Maybe all these contents are writing to the logfile or somewhere else, and causing this problem.

Most of these happen when one of the miners go offline, after restarting the server, everything just works fine (cannot even possible to connect to the server). As I am not quite sure which protocol the communication is using, but I think the problem happens when data response to some miners and the contents just cannot go through.

Can you please share xmr node proxy config.json ? Remove your xmr address from or any other private info please. Also xmrig config will help.

Sure.

Configuration of xmr-node-proxy

{
  "pools": [
    {
      "hostname": "***",
      "port": 443,
      "ssl": true,
      "allowSelfSignedSSL": true,
      "share": 100,
      "username": "***", 
      "password": "***",
      "keepAlive": true,
      "coin": "xmr",
      "blob_type": "cryptonote",
      "default": true
    }
  ],
  "listeningPorts": [
    {
      "port": "***",
      "ssl": true,
      "diff": 50000
    },
    {
      "port": "***",
      "ssl": true,
      "diff": 10000
    }
  ],
  "bindAddress": "0.0.0.0",
  "developerShare": 0,
  "daemonAddress": "127.0.0.1:18081",
  "accessControl": {
    "enabled": false,
    "controlFile": "accessControl.json"
  },
  "httpEnable": true,
  "httpAddress": "0.0.0.0",
  "httpPort": "***",
  "httpUser": "***",
  "httpPass": "***",
  "addressWorkerID": "***",
  "minerInactivityTime": 120,
  "keepOfflineMiners": 1,
  "refreshTime": 60,
  "theme": "light",
  "coinSettings": {
    "xmr": {
      "minDiff": 1,
      "maxDiff": 10000000,
      "shareTargetTime": 30
    }
  }
}

Configuration of XMRig

{
    "autosave": true,
    "background": false,
    "colors": true,
    "randomx": {
        "init": -1,
        "mode": "auto",
        "1gb-pages": false,
        "rdmsr": true,
        "wrmsr": false,
        "numa": true
    },
    "cpu": {
        "enabled": true,
        "huge-pages": true
    },
    "opencl": {
        "enabled": false,
        "cache": true,
        "loader": null,
        "platform": "AMD"
    },
    "cuda": {
        "enabled": false,
        "loader": null
    },
    "donate-level": 1,
    "donate-over-proxy": 1,
    "log-file": null,
    "pools": [
        {
            "algo": null,
            "coin": null,
            "url": "***",
            "user": "***",
            "pass": null,
            "nicehash": false,
            "keepalive": true,
            "enabled": true,
            "tls": true,
            "tls-fingerprint": null,
            "daemon": false,
            "self-select": null
        }
    ],
    "print-time": 60,
    "retries": 5,
    "retry-pause": 5,
    "syslog": false,
    "user-agent": null,
    "verbose": 0,
    "watch": true
}

This happens periodically, it's been good for 20 hours or so because today only few machines are connected to it.

I looks like SSL port issue. Can you try without it enabled on proxy listening ports and miner config? If it helps, then likely issue is either on missing XNP cert* files or xmrig does not support SSL (mo xmrig compat builds does not have it turned on for example: https://github.com/MoneroOcean/xmrig/releases/download/v5.9.0-mo3/xmrig-v5.9.0-mo3-lin64-compat.tar.gz)

Sure, maybe I will just wait and see if the problem comes across again. If so, I will disable the SSL and test.

I do not manually add certificates because I suppose there are certificates in the file directory, can I directly use them?

I compile the XMRig on all the platforms on my own, and they are ranging from v5.6.2 to v5.8.2 with all the static libraries.

Hm, not noticed your remark amount that it can be good for some time. In this case can you please send me full xnp logs (out and errror files) to support@moneroocean.stream? There is high probability that error log screenshot you listed does not show anything really bad.

CPU usage by XNP is caused by the fact that it verifies shares before sending it to the pool and it can overwhelm your CPU. To solve that please use higher starting diff (for example if you have like 100KHs do not afraid to use like one million proxy diff) for both pool and mining ports in XNP config.json.

image

Happens again, I will send you the log after rebooting the server. Currently, I cannot even connect to the server through SSH.

I am not sure what should I send to the email, is the content in /home/nodeproxy/.pm2/logs/proxy-error.log?

There are only a few lines in it,

2020-03-13 06:29:21:406 +00:00: Miner socket error from 147.8.143.162: Error: write EPIPE
2020-03-13 06:29:21:500 +00:00: Miner socket error from 147.8.143.145: Error: write EPIPE
2020-03-13 06:29:21:547 +00:00: Miner socket error from 175.159.173.187: Error: write EPIPE
2020-03-13 07:04:42:207 +00:00: Miner socket error from 147.8.16.150: Error: write EPIPE
2020-03-13 07:05:04:816 +00:00: Miner socket error from 147.8.143.161: Error: write EPIPE
2020-03-13 07:05:30:020 +00:00: Miner socket error from 222.79.50.98: Error: write EPIPE

The current time of the machine is Fri Mar 13 07:09:57 UTC 2020, maybe it is not helpful enough for solving this problem. I will check the system logs and other logfiles and feed back as soon as possible.

Update:
I think the above contents were logged because the system is not responding, which is not the reason for the freeze of the system.

Checked the syslog and found nothing.

There is a /etc/cron.daily runs just after the system stucks, but checked the cron.daily folder and they are just normal.

I will try to disable the SSL and see whether the problem persists or not.

I suppose the problem may be due to the SSH port 22 was exposed by me on the public network, I found a lot of SSH brute-force attack during the breakdown time.

Now, I have set up the firewall on Azure and nobody can directly access the port 22 and 8081 directly from the public network. Let's see whether this problem persists. :-)

I am mining to multiple pools at the same time, and currently, I am not sure if the problem happens because the Hong Kong route of supportXMR is not stable.

As you can see, in the proxy-error.log file, the error started at 06:20, and in the picture, the hash rate began to drop. In the meantime, on Azure monitor, you can see the CPU and disk read is not normal.

proxy-error.log

supportXMR hashrate picture

Azure

I found if I simply leave it alone, it will come back to live several hours later. I will try to mine to other pools and see if it is still abnormal.

None of the solutions work. Now, deploy another proxy on AWS to see if the problem persists.

The problem persists on AWS. The most weird thing is that the CPU and Disk Read blow up at the same time for AWS and Azure server. I suppose there must be some cron job lead to this problem.

On AWS the problem persist. Switched back to xmrig-proxy... Thank you for your effort. There must be something wrong with my configuration...