Community Diligence Review of ByteBase (Destore) Allocator
Closed this issue · 9 comments
Review of Allocations from @Destore2023
Allocator Application: filecoin-project/notary-governance#1039
First example:
DataCap was given to:
Destore2023/MetaPathways-Bookkeeping#8
Public Open Dataset - key compliance requirement: Retrievability
1st point)
New GitHub ID. Client asked to fill form - need gov team to investigate details
2nd point) Allocation Tranche Schedule to clients::
First: 25%
• Second: 25%
• Third: 25%
• Fourth: 25%
• Max per client overall: 20 PiB
Client asked for 10PiB, as a first time GitHub user. Client was given 1PIB first allocation.
3rd point)
Client said these were the SPs
f01841131 HONGKONG
f02244750 LONDON
f01807908 MADRID
f01879772 HONGKONG
f02363999 SINGAPORE
Actual data storage report:
https://check.allocator.tech/report/Destore2023/MetaPathways-Bookkeeping/issues/8/1718366438449.md
Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals Mean Spark Retrieval Success Rate 7d
f03089385 New York City, New York, US
SingleHop LLC 581.13 TiB 45.47% 581.13 TiB 0.00% -
f03090976 Dulles Town Center, Virginia, US
SpeedyPage Ltd 300.00 TiB 23.47% 300.00 TiB 0.00% -
f03089826 Dulles Town Center, Virginia, US
SpeedyPage Ltd 227.97 TiB 17.84% 227.97 TiB 0.00% -
f03106356 Dulles Town Center, Virginia, US
SpeedyPage Ltd 169.06 TiB 13.23% 169.06 TiB 0.00% -
No SPs from original list match match. No Diligence on any SPs. No known entities - all the same?
4th point) 0% retrievability on all SPs
Second example:
DataCap was given to:
Destore2023/MetaPathways-Bookkeeping#6
Public Open Dataset - key compliance requirement: Retrievability
1st point)
New GitHub ID. No sign of KYC - need gov team to investigate details
2nd point) Allocation Tranche Schedule to clients::
First: 25%
• Second: 25%
• Third: 25%
• Fourth: 25%
• Max per client overall: 20 PiB
Client asked for 5PiB. Client was given 500TiB first allocation.
3rd point)
Client said these were the SPs
- f02247136, Beijing
- f01344987, HongKong
- f02128256, Toronto
- f01843994, Dulles
- f01844118, NewYork
Actual data storage report:
https://check.allocator.tech/report/Destore2023/MetaPathways-Bookkeeping/issues/6/1718366458092.md
Provider | Location | Total Deals Sealed | Percentage | Unique Data | Duplicate Deals | Mean Spark Retrieval Success Rate 7d
f03089385 | New York City, New York, USSingleHop LLC | 1.29 PiB | 36.90% | 1.29 PiB | 0.00% | -
f03089826new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 881.72 TiB | 24.61% | 881.72 TiB | 0.00% | -
f03090976new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 635.94 TiB | 17.75% | 635.94 TiB | 0.00% | -
f03106356new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 452.97 TiB | 12.64% | 452.97 TiB | 0.00% | -
f03087482 | Dulles Town Center, Virginia, USSpeedyPage Ltd | 290.50 TiB | 8.11% | 290.50 TiB | 0.00%
No SPs match from original list.
4th point) 0% retrievability on all SPs
SPs for both applications are the same, look to be located in same location, need more investigation from Gov Team.
Dear @filecoin-watchdog
We always knew that the way the community agreed to search was GraphSync, Bitswap and Http. For this reason, we have used the results of these three retrieval methods as a marker of successful retrieval. It was only when we learned at the notary's meeting that the latest retrieval standard was determined by the results on https://spacemeridian.grafana.net/public-dashboards/32c03ae0d89748e3b08e0f08121caa14?orgId=1 that we started to communicate with the client about the changes.
We ask clients to contact SPs to study Spark retrieval principles and adjust the configuration to meet Spark retrieval requirements. The client's feedback was that SP felt the code for spark was different from the code for other retrieval methods used in the past, SP uses #go-file market to send the offline deal for data sealing, that why SPRK didn't catch any CID information for these data.
Here is the code which SP found doesn't match with SPARK.
SP needed time to update the code and hopefully too be success within next week.
Thanks
Retrieval update:
SPs have set up data of retrieval manually and launched the lassie daemon. It can be retrieved successfully.
But it seems that their nodes are still not in the list on https://api.filspark.com/rounds/current.
@bajtos Can you help with it?
Hello, and thanks for reaching out! 👋🏻
We don't have any retrievable deals in Spark's database for the miner f03106356
.
When I look at StateMarketDeals, I see that this miner's deals have DealProposal.Label
field set to values starting with mAXCg5A
, like mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg
. That does not look like a valid CID and therefore Spark ignores those deals.
You can learn more about this in my recent blog post: https://blog.filstation.app/posts/how-spark-discovers-content-stored-in-fil-deals
@Destore2023, do you know by chance what tooling is used by the clients of f03106356
to store the data and/or why the Label values start with the mAXCg5A
prefix? What kind of data is that?
Could you please paste the CID from your screenshots in plaintext so I can run some tests on my side, too?
What I found in my research:
- It looks like the Label value is a base64-encoded CID (
m
is the multibase code for base64 encoding.) - I can convert the value
mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg
to CIDv1bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq
. - That CID has the same prefix as the CID in your screenshot (
bafykbzac
). That makes me think I am on the right track here. - However, this CID is not found in the IPNI index.
I tested Labels from all f03106356
deals found in my snapshot from July 2nd, converted them from base64 multibase to CIDv1 format. None of them were found in the cid.contact database.
When I paste the CID to the CID Inspector website, I see they are using the blake2b-256 hash, while typically IPFS CIDs use the sha2-256 hash. I am not sure if that's a problem or not.
When I use Lassie to retrieve the CID bafykbz...hinlq
from '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4'
shown in your screenshot, it fails with the error "no candidates".
❯ lassie fetch -o /dev/null -vv --dag-scope block --protocols graphsync --providers '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq
Fetching bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq from specified provider(s)2024-07-04T14:53:48.782+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "started-fetch", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:48.784+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "started-finding-candidates", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:49.525+0200 DEBUG lassie/retriever retriever/directcandidatesource.go:133 retrieving metadata from libp2p protocol list {"peer": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "candidates-found", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "candidates": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "failed", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "errorMessage": "no candidates"}
2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "finished", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
no candidates
Hello, and thanks for reaching out! 👋🏻
We don't have any retrievable deals in Spark's database for the miner
f03106356
.When I look at StateMarketDeals, I see that this miner's deals have
DealProposal.Label
field set to values starting withmAXCg5A
, likemAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg
. That does not look like a valid CID and therefore Spark ignores those deals.You can learn more about this in my recent blog post: https://blog.filstation.app/posts/how-spark-discovers-content-stored-in-fil-deals
@Destore2023, do you know by chance what tooling is used by the clients of
f03106356
to store the data and/or why the Label values start with themAXCg5A
prefix? What kind of data is that?Could you please paste the CID from your screenshots in plaintext so I can run some tests on my side, too?
What I found in my research:
- It looks like the Label value is a base64-encoded CID (
m
is the multibase code for base64 encoding.)- I can convert the value
mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg
to CIDv1bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq
.- That CID has the same prefix as the CID in your screenshot (
bafykbzac
). That makes me think I am on the right track here.- However, this CID is not found in the IPNI index.
I tested Labels from all
f03106356
deals found in my snapshot from July 2nd, converted them from base64 multibase to CIDv1 format. None of them were found in the cid.contact database.When I paste the CID to the CID Inspector website, I see they are using the blake2b-256 hash, while typically IPFS CIDs use the sha2-256 hash. I am not sure if that's a problem or not.
When I use Lassie to retrieve the CID
bafykbz...hinlq
from'/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4'
shown in your screenshot, it fails with the error "no candidates".❯ lassie fetch -o /dev/null -vv --dag-scope block --protocols graphsync --providers '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq Fetching bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq from specified provider(s)2024-07-04T14:53:48.782+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "started-fetch", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""} 2024-07-04T14:53:48.784+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "started-finding-candidates", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""} 2024-07-04T14:53:49.525+0200 DEBUG lassie/retriever retriever/directcandidatesource.go:133 retrieving metadata from libp2p protocol list {"peer": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"} 2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "candidates-found", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "candidates": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"} 2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "failed", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "errorMessage": "no candidates"} 2024-07-04T14:53:50.226+0200 DEBUG lassie/retriever retriever/retriever.go:291 retrieval-event {"code": "finished", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""} no candidates
@bajtos Hi Miroslav, thank for your reply. I think the SP already fix the problem based on you reply. It works! Thanks for your help.
Hi @filecoin-watchdog @galen-mcandrew , it looks the retrieval data is on the way.
Based on a further diligence review, this allocator pathway is partially in compliance with their application
Specifically:
- Mixed evidence of diligence with clients (minimal verification of client claims, brand new client accounts)
- Subsequent allocations given despite noncompliant client deal-making, with minimal allocator intervention through comments
- Clients requesting very large initial amounts, and allocator is overriding, but initial & subsequent allocations do not match tranche schedule
- SPs are identical across clients
- Mixed retrievability for datasets, despite claims of public open data by both allocator and client (not showing distributed network data storage utility); there is evidence above of allocator working with Spark to improve retrieval testing
Given this mixed review, we are requesting that the allocator verify that they will uphold all aspects & requirements of their initial application. If so, we will request an additional 5PiB of DataCap from RKH, to allow this allocator to show increased diligence and alignment.
@Destore2023 can you verify that you will enforce program and allocator requirements?
(for example: public diligence, tranche schedules, and public scale retrievability like Spark).
Please reply here with acknowledgement and any additional details for our review.
Dear @galen-mcandrew
Yes, we will continue to request the client to keep having a high retrieval success rate on spark, as with our previous efforts with the spark team. For diligence with clients, we will also continue to use questionnaires, as well as increase the monitoring of comments on the github. We will show increased diligence and alignment.