filecoin-project/boost

Pass Accepted / Transferred / Published checkpoint counters along to the deal filter

ze42 opened this issue · 12 comments

ze42 commented

Checklist

  • This is not a new feature or an enhancement to the Filecoin protocol. If it is, please open an FIP issue.
  • This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the Boost forum and select the category as Ideas.
  • I have a specific, actionable, and well motivated feature request to propose.

Boost component

  • boost daemon - storage providers
  • boost client
  • boost UI
  • boost data-transfer
  • boost index-provider
  • Other

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Provide to storage providers the ability to accept deals based on the current concurrent downloads through the deal filter.

The current info allows to filter based on the sealing pipeline states (AP / PC1 / PC2/ ... ) but when receiving alot of online deals, many concurrent downloads can be started before reaching the miner. The idea is to give miners the ability to take the boost staged deals that will soon reach the sealing pipeline into account.

State of the different deal currently being processed should be taken into account when deciding if we want to accept new deals

Describe the solution you'd like

Boost add these existing info to the dealfilter.

Proposed format :

  "StagingState": {
    "Accepted": {
      "graphsync": {
        "Deals": 0,
        "CumulativeSize": 0
      },
      "http": {
        "Deals": 1,
        "CumulativeSize": 2048
      },
      "libp2p": {
        "Deals": 0,
        "CumulativeSize": 0
      }
    },
    "Transferred": {
      "Deals": 1,
      "CumulativeSize": 2048
    },
    "Published": {
      "Deals": 0,
      "CumulativeSize": 0
    }

Describe alternatives you've considered

None

Additional context

We made that development and can propose a PR.

We are already suffering a slow down with the current data sent and time it takes to run the deal filters. The issue is captured at #1733
I would suggest an overhaul of how external filters are triggered and what data should be sent for each deal.

Proposal of the new format:

  1. External binary is replaced by a RPC call or a POST request (Boost)
  2. The data sent with this call with only be the deal proposal (Boost)
  3. The external filter should run a process which can respond RPC requests (CIDGravity or other filters)
  4. The external deal filter can periodically query the "GraphQL" endpoint to collect the state of Boost and Miner (CIDGravity or other filters)
ze42 commented

Before going to a complete change of workflow, and the way dealfilters interface with boost, we should have a look at what exactly takes time.

As @cryptowhizzard mentionned, removing the parameter generation seems to be enough. My guess would be that some parameters collected takes too much resources when applied to big storage.

Having only values that would not be dependant on machine sizes, but mostly linear should probably be enough to fix the issue.

From some quick search on our end, we suspect the api.SectorsSummary(ctx) to be taking a long of time, and being the main reason it takes way too long.

This api call in Lotus takes along time to proceed. Calling lotus /debug/metric returns the data (along alot of other stuffs) much faster. Maybe getting the data from there (or ask Lotus to get a faster API getting the computed data, rather than computing it each time) would be enough to drastictly reduce the time, and still have the desired data.

Example, calling SectorsSummary (almost one second on that miner, could take over 30s on big miners like for cryptowhizzard)

time curl -s -X PST -H "Content-Type: application/json" --data '{ "jsonrpc": "2.0", "method": "Filecoin.'SectorsSummary'", "params": [''], "id": 3 }' 'http://miner:2345/rpc/v0'%7C jq .
{
  "jsonrpc": "2.0",
  "result": {
    "Packing": 7,
    "PreCommit1": 16,
    "PreCommit2": 2,
    "Proving": 6328,
    "Removed": 6852,
    "WaitSeed": 1
  },
  "id": 3
}
time: 926ms

And very fast metrics API, returning the same data (cut to only the matching ones here)

curl -s http://miner:2345/debug/metrics
[...]
lotus_sealing_states{sector_state="Packing"} 7
lotus_sealing_states{sector_state="PreCommit1"} 16
lotus_sealing_states{sector_state="PreCommit2"} 2
[...]
lotus_sealing_states{sector_state="Proving"} 6328
lotus_sealing_states{sector_state="Removed"} 6851
[...]
lotus_sealing_states{sector_state="WaitSeed"} 1
[...]
time: 79ms

Even with the most leaniest dealfilter ( I am only sending the deal parameters , no sector status etc. etc. ) it seems that boost still cannot handle incoming deal requests properly when it is stressed / loaded.

Fyi to give context to the thing below:

When the duration is above 30 seconds the client ( sender ) of the deal is gone. The client get's "stream reset" errors.

2023-10-19T10:08:38.133+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "9a55c269-b8e4-4542-bfdd-214d02943716", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "14d75316-4737-40bc-9ce2-b858e60c55a4", "duration": "1m45.666006766s"}
2023-10-19T10:09:02.341+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "00996245-b22f-466d-99c6-a7f8d4cada1e", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqn436o6hcwhxrywdjbbfvi76wnmjxaifgoshxloxqtrvc33qzpsea", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333374", "end epoch": "4802174", "price per epoch": "0", "duration": "1m39.871729141s"}
2023-10-19T10:09:06.923+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "d45bf699-c13f-4073-bf5a-b88fee384157", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "00996245-b22f-466d-99c6-a7f8d4cada1e", "duration": "1m44.454020025s"}
2023-10-19T10:09:30.772+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "dc84f8ff-192f-4d9a-a3d7-89af600afd2d", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqmxhxnoyi5jjqyo5slskj43ljzukapx4uhhekdpups76sk4upbkjy", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333375", "end epoch": "4802175", "price per epoch": "0", "duration": "1m38.33452143s"}
2023-10-19T10:09:32.408+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "6e069c6b-0401-4c9b-86d4-ee1e53f544e2", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "dc84f8ff-192f-4d9a-a3d7-89af600afd2d", "duration": "1m39.970162675s"}
2023-10-19T10:09:44.597+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "b9cd1bef-66ed-4c81-b7a3-c82d75ef3c44", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqev445lvuzgx3sdsosa5uqiq2ylwqkmuutfmptsshdjyayqyrfoei", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333376", "end epoch": "4802176", "price per epoch": "0", "duration": "1m22.132652245s"}
2023-10-19T10:09:44.608+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "86829218-0e35-4bc5-8383-69f3ac9f3def", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "b9cd1bef-66ed-4c81-b7a3-c82d75ef3c44", "duration": "1m22.144552813s"}
2023-10-19T10:09:49.990+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "e12e13d7-a9dd-4816-8020-cadd69f863a5", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqcp424ahi24dmvuvotk2jldvya44vuvksa2un3cujxplvdtwmdebi", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333378", "end epoch": "4802178", "price per epoch": "0", "duration": "27.566287954s"}
2023-10-19T10:09:49.991+0200 INFO boost-prop lp2pimpl/net.go:262 send deal proposal response {"id": "f7a790e0-0a74-47d5-bae3-c7c1ba4b3c6f", "accepted": true, "msg": "", "peer id": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "client address": "f1ps34emfyq2tbioxwzr7tm2nqwprkiehsccvn7gy", "provider address": "f01771403", "piece cid": "baga6ea4seaqoqumfeolmerbapmgnwhy7lefngm4jqqu5tqauv6eev6infn5iubq", "piece size": 34359738368, "verified": true, "label": "bafyreieqf7k6qejlub6iwfefrxosnwzqvu6ab26cw2woll3g5ihrwjey7u", "start epoch": "3333377", "end epoch": "4802177", "price per epoch": "0", "duration": "57.551637991s"}
2023-10-19T10:09:58.445+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "6efd9218-5785-47c7-a636-e596b93e63fc", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "f7a790e0-0a74-47d5-bae3-c7c1ba4b3c6f", "duration": "1m6.006104394s"}
2023-10-19T10:09:58.447+0200 DEBUG boost-net lp2pimpl/net.go:233 handled deal proposal request {"reqlog-uuid": "b41d58d9-55d2-4a37-850d-4383684e512c", "client-peer": "12D3KooWKnDj8qencX3iXAtbChf8UNzsL8GASbR8xgymg4QawEt3", "id": "e12e13d7-a9dd-4816-8020-cadd69f863a5", "duration": "36.02376174s"}

ze42 commented

@cryptowhizzard Could you try with the additional commit from https://github.com/CIDgravity/boost/tree/dealfilter-timing which should add some timing logs for deal-filter, having getDealFilterParams on one side, and exec on the other side?

Hi
can you move on this one ?

I've figured out the slowness on my machine;

its 3 serialized calls to a seemingly really slow database, providing funds, sealing and space information;

https://github.com/filecoin-project/boost/blob/main/storagemarket/provider_dealfilter.go#L42-L70

These calls can take 3 to 8 seconds, but sometimes (over multiple deals in 1 hour) they are 30 to 70 seconds (which raise another bug which I'll post here too)

I've removed them now and will post new times that I'm seeing

This also took 3 seconds, as did the step inserted deal into deals DB

Here are the DB sizes:

-rw-r--r--  1 user user 329M Dec 20 20:01 boost.retrieval.db
-rw-r--r--  1 user user  53M Dec 20 20:01 boost.logs.db
-rw-r--r--  1 user user 680M Dec 20 19:57 boost.db

see #1852 for the missing index issues
see #1851 for a bunch of useless data

@RobQuistNL

Please note that with removing the things from provider dealfilter.go you will be incompatible with CID gravity.

It’s on my list to reover some parts build from external sources like Farcaster, after 7 jan 2024. It should not be in Boost anyway in my personal opinion. Boost should make deals and faaaaasssstttttt :)