Open kfs leveldbs never closed

Question

Open kfs leveldbs never closed

littleskunk opened this issue 8 years ago · 21 comments

Package Versions

root@storj:~# npm list -g kfs
/usr/local/lib
└─┬ storjshare-daemon@2.4.4
  └─┬ storj-lib@6.2.1
    └── kfs@3.1.2

v6.10.0

Expected Behavior

I have 2 unused download requests and one completed download. The unused download are expired (30 minutes TOKEN_EXPIRE). 1 minutes later (SBUCKET_IDLE) these leveldbs should be closed. I would expect only 4 open files (contract db).
Note: This is my worst case expectation and I could live with it. Also possible is a 1 minute timeout after data channel authorized. Close leveldb and open it again as soon as the download is used.

root@storj:~# grep 'download\|upload\|contract offer' .storjshare/storjshare/logs/188071ba7cfd974a9e47b59e24b0737ebf845db3.log
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-02-28T15:19:12.366Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-02-28T15:29:39.830Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-02-28T15:57:29.300Z"}
{"level":"debug","message":"Shard download completed","timestamp":"2017-02-28T15:57:35.104Z"}

Actual Behavior

Kfs leveldbs are never closed.

root@storj:~# ls -l /proc/418/fd | grep 'storjshare'
l-wx------ 1 root root 64 Feb 28 16:59 111 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/165.s/LOG
lrwx------ 1 root root 64 Feb 28 16:59 112 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/165.s/LOCK
l-wx------ 1 root root 64 Feb 28 16:59 113 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/165.s/000450.log
l-wx------ 1 root root 64 Feb 28 16:59 114 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/165.s/MANIFEST-000449
l-wx------ 1 root root 64 Feb 28 16:10 12 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/contracts.db/LOG
lrwx------ 1 root root 64 Feb 28 16:10 13 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/contracts.db/LOCK
l-wx------ 1 root root 64 Feb 28 16:10 14 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/contracts.db/004236.log
l-wx------ 1 root root 64 Feb 28 16:10 15 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/contracts.db/MANIFEST-004235
l-wx------ 1 root root 64 Feb 28 16:12 23 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/210.s/LOG
lrwx------ 1 root root 64 Feb 28 16:12 25 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/210.s/LOCK
l-wx------ 1 root root 64 Feb 28 16:12 45 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/210.s/066990.log
l-wx------ 1 root root 64 Feb 28 16:12 46 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/210.s/MANIFEST-066989
l-wx------ 1 root root 64 Feb 28 16:13 82 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/121.s/LOG
lrwx------ 1 root root 64 Feb 28 16:13 83 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/121.s/LOCK
l-wx------ 1 root root 64 Feb 28 16:13 84 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/121.s/000448.log
l-wx------ 1 root root 64 Feb 28 16:13 85 -> /root/.storjshare/storjshare/shares/188071ba7cfd974a9e47b59e24b0737ebf845db3/sharddata.kfs/121.s/MANIFEST-000447

Steps to Reproduce

Please include the steps the reproduce the issue, numbered below. Include as
much detail as possible.

Start farning
ls -l /proc/418/fd | grep 'storjshare'
grep 'download\|upload\|contract offer' .storjshare/storjshare/logs/188071ba7cfd974a9e47b59e24b0737ebf845db3.log

Answer 1 · 2017-02-28T23:35:28.000Z

I can also confirm that KFS does emit the idle event and does close the bucket afterwards. However, the way the Storj is keeping track of capacity available, it is measuring the entire store after every contract save/update/etc - this causes all the buckets to stay open, since every time a PUBLISH message is received, we call that causing the pending operations for each bucket to be > 0 when the idle state is checked thus leaving the bucket open.
storj-archived/kfs#48 (comment)

Answer 2 · 2017-03-01T23:37:27.000Z

Reopen: 30 minutes and open file count is increasing but not decreasing.

root@storj:~# npm list -g kfs
/usr/local/lib
└─┬ storjshare-daemon@2.5.0
  └─┬ storj-lib@6.3.0
    └── kfs@3.1.4

root@storj:~# ls -l /proc/416/fd | grep 'storjshare' | wc -l
44
root@storj:~# grep 'download\|upload\|contract offer' .storjshare/storjshare/logs/188071ba7cfd974a9e47b59e24b0737ebf845db3.log
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:03:12.087Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:03:47.298Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:05:40.132Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-01T23:09:55.690Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:15:24.471Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-01T23:16:17.373Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-01T23:18:08.934Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:22:11.936Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:23:08.716Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:27:22.781Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:27:24.385Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:29:12.511Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:29:25.335Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:30:32.175Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:30:53.657Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:31:22.901Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-01T23:33:37.958Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:34:42.812Z"}
{"level":"debug","message":"received contract offer...","timestamp":"2017-03-01T23:35:10.074Z"}

Answer 3 · 2017-03-02T07:19:38.000Z

Based on the last activity I would expect something beween 4 and 12 open files.

{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-02T06:42:46.877Z"}
{"level":"debug","message":"Shard download completed","timestamp":"2017-03-02T06:42:53.039Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-02T07:09:22.129Z"}
{"level":"debug","message":"Shard download completed","timestamp":"2017-03-02T07:09:33.271Z"}
{"level":"info","message":"authorizing download data channel for 144d1265baf0908fe6c6bd272c701aac7811e3e4","timestamp":"2017-03-02T07:12:53.168Z"}

root@storj:~# ls -l /proc/416/fd | grep 'storjshare' | wc -l
184

Answer 4 · 2017-03-20T13:13:40.000Z

One of our community members (@andyjc) has this issue:

{info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"91.92.111.163","port":11634,"nodeID":"090d3ff382ad9be05a94914fd00d7b3c4a23a546","lastSeen":1489957848478}
{info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.2","protocol":"1.1.0","address":"209.93.13.215","port":4107,"nodeID":"095bf027715564b6989cf8fa60fc74dd33404aea","lastSeen":1489958333754}
{info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.0","protocol":"1.1.0","address":"203.97.196.252","port":60547,"nodeID":"08a37eacb12d9eb248059b6be2535a06c7791a9a","lastSeen":1489958315389}
{info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.2","protocol":"1.1.0","address":"client022.storj.dk","port":15023,"nodeID":"0918fd0b1ac23a6e23ba29ae5d2becc3d1d9e1d8","lastSeen":1489958183119}
{error} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] Could not get usedSpace: IO error: /media/andrew/b7d12396-25e3-4878-a1bf-f135fcfecf43/Storj1/storjshare-d5a7b9/sharddata.kfs/090.s: Too many open files
{info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"209.93.13.215","port":4107,"nodeID":"095bf027715564b6989cf8fa60fc74dd33404aea","lastSeen":1489702078035}
{info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"client022.storj.dk","port":15023,"nodeID":"0918fd0b1ac23a6e23ba29ae5d2becc3d1d9e1d8","lastSeen":1489706641140}
{info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.0","protocol":"1.1.0","address":"203.97.196.252","port":60547,"nodeID":"08a37eacb12d9eb248059b6be2535a06c7791a9a","lastSeen":1489683361530}
{info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] replying to message to c50df964efaaddea8eb590a41f0e1d5a6865c40d
{warn} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] rpc call d58ad5a6a21bbae10dc13c6627b3f1e8d67100c1 timed out
{info} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] replying to message to 8bb59c92429fc8c0728c5ab417738edf8bc29d75
{warn} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] rpc call 58df090420249a28ee01e60a458689f7b9099c3f timed out

Answer 5 · 2017-03-20T14:49:18.000Z

Wrong user

…

On 20 March 2017 at 13:13, Meije Sibbel ***@***.***> wrote: One of our community members ***@***.*** <https://github.com/andyjc>) has this issue: {info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"91.92.111.163","port":11634,"nodeID":"090d3ff382ad9be05a94914fd00d7b3c4a23a546","lastSeen":1489957848478} {info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.2","protocol":"1.1.0","address":"209.93.13.215","port":4107,"nodeID":"095bf027715564b6989cf8fa60fc74dd33404aea","lastSeen":1489958333754} {info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.0","protocol":"1.1.0","address":"203.97.196.252","port":60547,"nodeID":"08a37eacb12d9eb248059b6be2535a06c7791a9a","lastSeen":1489958315389} {info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] sending PUBLISH message to {"userAgent":"6.3.2","protocol":"1.1.0","address":"client022.storj.dk","port":15023,"nodeID":"0918fd0b1ac23a6e23ba29ae5d2becc3d1d9e1d8","lastSeen":1489958183119} {error} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] Could not get usedSpace: IO error: /media/andrew/b7d12396-25e3-4878-a1bf-f135fcfecf43/Storj1/storjshare-d5a7b9/sharddata.kfs/090.s: Too many open files {info} [Mon Mar 20 2017 06:19:06 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"209.93.13.215","port":4107,"nodeID":"095bf027715564b6989cf8fa60fc74dd33404aea","lastSeen":1489702078035} {info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.2","protocol":"1.1.0","address":"client022.storj.dk","port":15023,"nodeID":"0918fd0b1ac23a6e23ba29ae5d2becc3d1d9e1d8","lastSeen":1489706641140} {info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] received valid message from {"userAgent":"6.3.0","protocol":"1.1.0","address":"203.97.196.252","port":60547,"nodeID":"08a37eacb12d9eb248059b6be2535a06c7791a9a","lastSeen":1489683361530} {info} [Mon Mar 20 2017 06:19:07 GMT+0900 (JST)] replying to message to c50df964efaaddea8eb590a41f0e1d5a6865c40d {warn} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] rpc call d58ad5a6a21bbae10dc13c6627b3f1e8d67100c1 timed out {info} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] replying to message to 8bb59c92429fc8c0728c5ab417738edf8bc29d75 {warn} [Mon Mar 20 2017 06:19:08 GMT+0900 (JST)] rpc call 58df090420249a28ee01e60a458689f7b9099c3f timed out — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#666 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACPH89lPE918_9p5kkvXLsRzZFi_HgZks5rnnuGgaJpZM4MPAvK> .

Answer 6 · 2017-03-22T23:34:55.000Z

I am having this issue on latest version of OS X and GUI. its worse than the previous release from last week. Basically my count of peers drops to zero and when I click on the GUI it says too many files open. It fixes now better than before I can just restart the app but it typically dies out after a few hours. On windows this is fine but OS X its a constant issue.

Answer 7 · 2017-04-11T16:16:13.000Z

events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: IO error: /home/user/StorjL/STORJ03/sharddata.kfs/216.s: Too many open files
    at Error (native)

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128125
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128125
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

storjshare version

  * daemon: 2.5.1, core: 6.3.2, protocol: 1.1.0

Answer 8 · 2017-04-13T18:51:51.000Z

I'm having the same issue which is causing my nodes to use more memory than they should and eventually causing crashes, the majority of files are LOG, and LOCK files - i am storing my drives over the network

Answer 9 · 2017-05-08T14:37:19.000Z

Same here:

Error: IO error: /root/.storjshare/sharddata.kfs/250.s/000532.ldb: Too many open files
    at Error (native)
{"level":"error","message":"failed to read from mirror node: connect ETIMEDOUT 196.54.41.47:56223","timestamp":"2017-05-08T05:07:39.159Z"}
{"level":"error","message":"failed to read from mirror node: connect ETIMEDOUT 73.109.143.94:9735","timestamp":"2017-05-08T09:57:31.395Z"}
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: IO error: /root/.storjshare/sharddata.kfs/046.s/000604.log: Too many open files
    at Error (native)
root@JGI02:/# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3917
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3917
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
root@JGI02:/# npm list -g kfs
/root/.nvm/versions/node/v6.9.5/lib
├─┬ storj-lib@6.1.2
│ └── kfs@3.1.1
└─┬ storjshare-daemon@2.5.3
  └─┬ storj-lib@6.4.2
    └── kfs@3.1.5

Answer 10 · 2017-05-12T17:27:22.000Z

Same thing here:

daemon: 2.5.4, core: 6.4.2, protocol: 1.1.0

Error: IO error: /opt/storj/sharddata.kfs/049.s/001893.ldb: Too many open files
at Error (native)
events.js:160
throw er; // Unhandled 'error' event
^

Answer 11 · 2017-05-13T23:01:37.000Z

Confirmed on linux:

{"level":"error","message":"failed to read from mirror node: connect ETIMEDOUT 158.69.248.73:5015","timestamp":"2017-05-13T21:33:05.997Z"}
events.js:163
throw er; // Unhandled 'error' event
^

Error: IO error: /drivepath/sharddata.kfs/241.s/000588.ldb: Too many open files

Addition infos:
daemon: 2.5.3, core: 6.4.2, protocol: 1.1.0
linux 4.10, max files the kernel permits to open is deafult (should be 1024).

Answer 12 · 2017-05-14T14:05:57.000Z

I got this error too:

{"level":"info","message":"sending FIND_NODE message to {"userAgent":"6.4.2","protocol":"1.1.0","address":"68.184.86.253","port":17514,"nodeID":"9cf0a5ae16721eb47e078c392dd998a80ec8fc1a","lastSeen":1494769057511}","timestamp":"2017-05-14T13:37:41.177Z"}
events.js:163
throw er; // Unhandled 'error' event
^

Error: connect EMFILE 68.184.86.253:17514 - Local (undefined:undefined)
at Object.exports._errnoException (util.js:1050:11)
at exports._exceptionWithHostPort (util.js:1073:20)
at internalConnect (net.js:889:16)
at lookupAndConnect (net.js:977:5)
at Socket.realConnect (net.js:945:5)
at Agent.connect [as createConnection] (net.js:77:22)
at Agent.createSocket (_http_agent.js:195:26)
at Agent.addRequest (_http_agent.js:157:10)
at new ClientRequest (http_client.js:212:16)
at Object.request (http.js:26:10)
at rawRequest (/usr/lib/node_modules/storjshare-daemon/node_modules/restify/lib/clients/http_client.js:155:17)
at FunctionCall.doCall (/usr/lib/node_modules/storjshare-daemon/node_modules/backoff/lib/function_call.js:156:20)
at FunctionCall.start (/usr/lib/node_modules/storjshare-daemon/node_modules/backoff/lib/function_call.js:145:10)
at JsonClient.request (/usr/lib/node_modules/storjshare-daemon/node_modules/restify/lib/clients/http_client.js:585:10)
at _write (/usr/lib/node_modules/storjshare-daemon/node_modules/restify/lib/clients/string_client.js:101:14)
at JsonClient.write (/usr/lib/node_modules/storjshare-daemon/node_modules/restify/lib/clients/string_client.js:133:13)

Answer 13 · 2017-07-09T05:00:57.000Z

Fix it ffs.. Not that you've paid me any SJCX in 2 months anyway.

/usr/lib
└─┬ storjshare-daemon@3.4.2
└─┬ storj-lib@6.6.0
└── kfs@3.1.5

Error: IO error: /mount/ext-hdd-1/storj3/sharddata.kfs/184.s/000139.ldb: Too many open files
    at Error (native)
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: IO error: /mount/ext-hdd-1/storj3/sharddata.kfs/163.s: Too many open files
    at Error (native)
{"level":"error","message":"Lookup operation failed to return results","timestamp":"2017-07-02T12:59:56.102Z"}
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: IO error: /mount/ext-hdd-1/storj3/sharddata.kfs/058.s/000333.log: Too many open files
    at Error (native)
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:05.315Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:05.526Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:06.438Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:07.907Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:07.907Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:08.867Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:08.867Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:09.601Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:09.725Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:12.090Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:12.090Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:12.839Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:14.072Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:14.377Z"}
{"level":"error","message":"IO error: /mount/ext-hdd-1/storj3/contracts.db/002994.ldb: Too many open files","timestamp":"2017-07-02T17:37:14.566Z"}

Answer 14 · 2017-07-20T21:43:58.000Z

This "Too many open files" error is really annoying for a lot of users, I can't run any of my nodes because of this storj-gui crashes at startup.

This error seems to be common https://hexo.io/docs/troubleshooting.html#EMFILE-Error
computmaxer/karma-jspm#97 (comment)
webpack-contrib/copy-webpack-plugin#59 (comment)
http://blog.jongallant.com/2017/01/emfile-too-many-open-files-windows/

I'm no expert on the matter but it seems that using graceful-fs solved the issue for many https://github.com/isaacs/node-graceful-fs#improvements-over-fs-module

Answer 15 · 2017-08-20T10:34:26.000Z

MeijeSibbel commented 7 years ago

Answer 16 · 2017-08-22T21:13:01.000Z

daemon: 3.5.5, core: 6.8.0, protocol: 1.1.0

24 hours after starting 4 farmers the shard reaper starts, and after a while, I see hundreds of thousands of open files, that continue to creep up, and never close.

ls -l /proc/6393/fd | grep '/storj/' | wc -l
106909

Answer 17 · 2018-02-01T04:06:35.000Z

Did i get this right - this is a bug found nearly a year ago and is attached to a milestone that has no due date?

...
{"level":"info","message":"replying to message to 47813df2ef9dc0b9effcc02b7525de7f11bbc042","timestamp":"2018-02-01T01:13:58.635Z"}
{"level":"warn","message":"error returned from remote host: connect ECONNREFUSED 192.99.7.107:4172","timestamp":"2018-02-01T01:14:03.647Z"}
{"level":"warn","message":"missing or empty reply from contact","timestamp":"2018-02-01T01:14:03.647Z"}
events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: IO error: /mnt/storjmach00mount/node01/sharddata.kfs/171.s/001573.ldb: Too many open files
{"level":"warn","message":"your address is public and traversal strategies are disabled","timestamp":"2018-02-01T01:14:07.274Z"}
...

Crash happened after uptime of bit more than 5 days.

Machine info

Storj is running on server grade hardware (WD reds, RAID-Z2, ECC etc)

$ uname -a
FreeBSD storjjail 11.1-STABLE FreeBSD 11.1-STABLE #0 r321665+4bd3ee42941(freenas/11.1-stable): Thu Jan 18 15:45:01 UTC 2018

$ sysctl kern.maxfiles kern.maxfilesperproc kern.openfiles
kern.maxfiles: 1038717
kern.maxfilesperproc: 934839
kern.openfiles: 4838

$ ulimit -a
number of pseudoterminals            (-P) unlimited
socket buffer size       (bytes, -b) unlimited
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) 33554432
file size               (blocks, -f) unlimited
max kqueues                     (-k) unlimited
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 934839
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 524288
cpu time               (seconds, -t) unlimited
max user processes              (-u) 34059
virtual memory          (kbytes, -v) unlimited
swap size               (kbytes, -w) unlimited

Storjshare info

$ storjshare --version
daemon: 5.3.0, core: 8.5.0, protocol: 1.2.0

$ npm list -g storjshare-daemon
/usr/home/storjroot/.nvm/versions/node/v8.9.4/lib
`-- storjshare-daemon@5.3.0

$ node --version
v8.9.4

Answer 18 · 2018-02-01T04:08:04.000Z

It's probably your mount not able to handle it.

Answer 19 · 2018-02-01T05:11:44.000Z

@ne0ark, sure it may also be a reason.

Do you have ideas how to go about testing which part is (if it is) the bottleneck?

Meanwhile, what are the requirements to run storjshare, I'm confused. I thought the premise is that Storj Share should be able to run at decent desktop grade machine with spare gigs on HDD. At least this https://storj.io/share.html gives such impression, marketing emphasis is on desktop GUI version for the Storj Share.
If currently ignoring the possibility that I have misconfigured the machine - can it really be the the case that a machine with 64GB ECC RAM, fast RAID HDDs, stable NICs, enterprise router/firewall in front cannot handle it? If so, then Storj marketing is misleading.

Btw, I know for a fact that at the crashing time there was no other load on the machine (raidz scrubs, replication), as I know by heart when they are scheduled.

And here are graphical load logs. By knowing loads when I actually do stuff with this machine the load at 1:14 AM when the crash happened is dead idle (and stuff you see after 1:14 is machine state when everybody is sleeping and the only considerable activity is by just-restarted Storj Share on the machine, only other activity I can think of at that time span is Nextcloud cronjobs)

Edit: missed 6th disk in the screenshot, but you get the picture 😄

Answer 20 · 2018-03-08T07:38:24.000Z

Isn't the issue that Storj needs more than 1024 open files? Isn't it huge? I can see that for every shard, there's 4 open files:

/sharddata.kfs/252.s/000034.log
/sharddata.kfs/252.s/LOCK
/sharddata.kfs/252.s/LOG
/sharddata.kfs/252.s/MANIFEST-000033

Perhaps we can close some? Or open these only when needed?

The solution for now is to upgrade the soft limit (something like https://serverfault.com/a/610135/114520), but I don't really like it either...

Answer 21 · 2018-10-30T09:51:45.000Z

👋 Hey! Thanks for this contribution. Apologies for the delay in responding!

We've decided to rearchitect Storj, so that we can scale better. You can read more about this decision here. This means that we are entirely focused on v3 at the moment, in the storj/storj repository. Our white paper for v3 is coming very, very soon - follow along on the blog and in our Rocketchat.

As this repository is part of the v2 network, we're no longer maintaining this repository. I am going to close this for now. If you have any questions, I encourage you to jump on Rocketchat and ask them there. Thanks!