storj-archived/core

Overextend allocated space

Closed this issue · 19 comments

Package Versions

Replace the values below using the output from npm list storj. Use npm list -g storj if installed globally.

└─┬ storjshare-daemon@2.5.1
  └── storj-lib@6.3.2

Replace the values below using the output from node --version.

v6.10.0

Expected Behavior

Please describe the program's expected behavior. Include an example of your
usage code in the back ticks below if applicable.

Storjshare-daemon and storjshare-gui should stop accepting new shards once the allocated size is reached.

Actual Behavior

Please describe the program's actual behavior. Please include any stack traces
or log output in the back ticks below.

The bridge is queuing the farmer OFFER. The farmer stops sending OFFER but even days or weeks later the farmer can get a shard on the queued OFFER.

@GERGX We found the root issue. Because Feburary was a short month, you feel within the grace period. We are going to reduce the grace period from 7 days to 3 days.
shawn on rocket chat

This will create an additional mirror for all associated shards for a contact when the contact is unreachable for 1 hour in 24 hours.
storj-archived/bridge#374

The more additional rules we add on the bridge the more shards the farmer will get later. He has no chance to predict the number or the size because it depends on different variables. Worst case a farmer needs 90 days to hit his limit and all of his shards get renewed for additional 90 days. The bridge will still have 90 days of queued OFFER and send him some bonus shards until the farmer hits the point of no return. If his drive is 100% full even the shard reaper will be unable to clean old shards. The shard reaper needs some free space to run.

One example: storj-archived/storjshare-gui#503

Steps to Reproduce

Please include the steps the reproduce the issue, numbered below. Include as
much detail as possible.

  1. Start farming with a low size limit
  2. Wait until it is full and keep it running a few hours more

I see a second problem: As farmer I send a OFFER to the bridge. The bridge can accept it or deny it. That would be fair for both sides and allow both to calculate for a 90 day periode.

At the moment implemented is a big bridge advantage. The bridge can simply queue all incoming OFFER. If the SJCX price goes up the bridge can use the queued cheap price. 90 days mixed costing is not possible for the farmer. That is not a fair market.

I will not open a new issue for this as long as we have a fixed price for all farmer.
Possible solution for both problems: The bridge can queue the OFFER. Each OFFER should have a expire date. If the bridge would like to create a new mirror it request a new OFFER. That gives the farmer the possibility to adjust his price and allocated space.

Today a farmer reported 0 free space on his device and has to delete shards. Lets try one or more of these workarounds:
1.) Delete all farmer logfiles.
2.) Run the shard reaper: https://gist.github.com/littleskunk/9f526cdbfa99a5891d097558af77e510
3.) Move the complete storjshare drive to a different location with more available space.

@braydonf , this issue needs attention as fast as possible, we are seeing more and more nodes (average of 4-10 cases a day) with TB's of Stored data that can't start the nodes anymore (r it just crashes) because there is no more free space because Storj does not respect the allocated space parameter. This issue started surfacing due to the stress test and is getting worse by the day.

@Luca666 on rocketchat:

clipboard - august 12 2017 8-44 am

image

I ran into this issue after having my node offline for 12 hours.
logs.zip

image

After letting it run for some more time, it seems the percentage is going up still.

and another one:

@petr.kopriva

pic

I have an idea: room for shards is calculated from the allocated size, when node was created and started to get the data.
If we reduce the allocated size later, created room doesn't reduced.
Now, when node is full, it's still reports that it have a room for shards.

Crashing issue have been fixed in https://github.com/Storj/core/releases/tag/v8.0.0

There is still the possibility for some oversharing, due to LevelDB approximateSize calculations sometimes not accounting for recently written data. Though this should not go over too much.

Reopen.

The levelDB approximateSize is not accurate when it comes to the used space on the file system but the difference it not significant. In my case 2GB difference on 1TB used space.
Anyway that doesn't matter. We are using approximateSize for sending OFFER and status output. If we use the same inaccurate function we should still target 100% used space and not more.

The problem was and still is the late mirror creation. My farmer has 2MB free space and will send an OFFER on any shard that fits. The bridge will queue the OFFER to create mirrors. At the end my farmer is lucky and will get one shard upload. 100% used space and time to stop sending OFFER. Minutes later the bridge will send me additional mirrors and I will end up with more than 100% used space.

Reopen

Right, but the crashing issue is fixed. There may be some over allocation but it should not lead to critical failure. Let's track that as another issue.

Do you mean that an OFFER would be sent when there was space available, but 15 days later when there isn't space available, the shard would be uploaded for a mirror which would put it over the disk space?

That code block would be for shard uploads. 15 days delay are possible from the farmer side. Once the OFFER is send and the contract is stored it can be executed any time. Even 89 days later would be possible. I don't think the bridge will allow this. If the renter would like to upload a shard the bridge will call routeConsignment immediately.

More likely this code block: https://github.com/Storj/core/blob/master/lib/network/protocol.js#L324

and yes it will still lead to critical failures. We removed the allocated size validation to avoid out of memory exceptions on calculating the used space. -> 8TB allocated on a 4TB drive is valid. Only the free space check will stop sending OFFER at some point but the mirror creation will still kill the farmer.

Crashing issue is not fixed, we still have farmers reporting their node(s) crashing due to out out space error.

Sorry for opening a duplicate issue, i haven't seen this one.
This is a serious issue, it used the whole disk of my computer.

Pull request merged. With the next release it should be fixed.

Tagged and included in v8.2.0