filecoin-project/Allocator-Governance

Community Diligence Review of ByteBase (Destore) Allocator

Closed this issue · 9 comments

Review of Allocations from @Destore2023
Allocator Application: filecoin-project/notary-governance#1039

First example:
DataCap was given to:
Destore2023/MetaPathways-Bookkeeping#8

Public Open Dataset - key compliance requirement: Retrievability

1st point)
New GitHub ID. Client asked to fill form - need gov team to investigate details

2nd point) Allocation Tranche Schedule to clients::
First: 25%
• Second: 25%
• Third: 25%
• Fourth: 25%
• Max per client overall: 20 PiB

Client asked for 10PiB, as a first time GitHub user. Client was given 1PIB first allocation.

3rd point)
Client said these were the SPs
f01841131 HONGKONG
f02244750 LONDON
f01807908 MADRID
f01879772 HONGKONG
f02363999 SINGAPORE

Actual data storage report:
https://check.allocator.tech/report/Destore2023/MetaPathways-Bookkeeping/issues/8/1718366438449.md

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals Mean Spark Retrieval Success Rate 7d
f03089385 New York City, New York, US
SingleHop LLC 581.13 TiB 45.47% 581.13 TiB 0.00% -
f03090976 Dulles Town Center, Virginia, US
SpeedyPage Ltd 300.00 TiB 23.47% 300.00 TiB 0.00% -
f03089826 Dulles Town Center, Virginia, US
SpeedyPage Ltd 227.97 TiB 17.84% 227.97 TiB 0.00% -
f03106356 Dulles Town Center, Virginia, US
SpeedyPage Ltd 169.06 TiB 13.23% 169.06 TiB 0.00% -

No SPs from original list match match. No Diligence on any SPs. No known entities - all the same?

4th point) 0% retrievability on all SPs

Second example:
DataCap was given to:
Destore2023/MetaPathways-Bookkeeping#6

Public Open Dataset - key compliance requirement: Retrievability

1st point)
New GitHub ID. No sign of KYC - need gov team to investigate details

2nd point) Allocation Tranche Schedule to clients::
First: 25%
• Second: 25%
• Third: 25%
• Fourth: 25%
• Max per client overall: 20 PiB

Client asked for 5PiB. Client was given 500TiB first allocation.

3rd point)
Client said these were the SPs

  1. f02247136, Beijing
  2. f01344987, HongKong
  3. f02128256, Toronto
  4. f01843994, Dulles
  5. f01844118, NewYork

Actual data storage report:
https://check.allocator.tech/report/Destore2023/MetaPathways-Bookkeeping/issues/6/1718366458092.md

Provider | Location | Total Deals Sealed | Percentage | Unique Data | Duplicate Deals | Mean Spark Retrieval Success Rate 7d
f03089385 | New York City, New York, USSingleHop LLC | 1.29 PiB | 36.90% | 1.29 PiB | 0.00% | -
f03089826new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 881.72 TiB | 24.61% | 881.72 TiB | 0.00% | -
f03090976new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 635.94 TiB | 17.75% | 635.94 TiB | 0.00% | -
f03106356new | Dulles Town Center, Virginia, USSpeedyPage Ltd | 452.97 TiB | 12.64% | 452.97 TiB | 0.00% | -
f03087482 | Dulles Town Center, Virginia, USSpeedyPage Ltd | 290.50 TiB | 8.11% | 290.50 TiB | 0.00%

No SPs match from original list.

4th point) 0% retrievability on all SPs

SPs for both applications are the same, look to be located in same location, need more investigation from Gov Team.

Dear @filecoin-watchdog

We always knew that the way the community agreed to search was GraphSync, Bitswap and Http. For this reason, we have used the results of these three retrieval methods as a marker of successful retrieval. It was only when we learned at the notary's meeting that the latest retrieval standard was determined by the results on https://spacemeridian.grafana.net/public-dashboards/32c03ae0d89748e3b08e0f08121caa14?orgId=1 that we started to communicate with the client about the changes.

We ask clients to contact SPs to study Spark retrieval principles and adjust the configuration to meet Spark retrieval requirements. The client's feedback was that SP felt the code for spark was different from the code for other retrieval methods used in the past, SP uses #go-file market to send the offline deal for data sealing, that why SPRK didn't catch any CID information for these data.

Here is the code which SP found doesn't match with SPARK.
001

002

SP needed time to update the code and hopefully too be success within next week.

Thanks

Retrieval update:
001
SPs have set up data of retrieval manually and launched the lassie daemon. It can be retrieved successfully.
002
But it seems that their nodes are still not in the list on https://api.filspark.com/rounds/current.

@bajtos Can you help with it?

Hello, and thanks for reaching out! 👋🏻

We don't have any retrievable deals in Spark's database for the miner f03106356.

When I look at StateMarketDeals, I see that this miner's deals have DealProposal.Label field set to values starting with mAXCg5A, like mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg. That does not look like a valid CID and therefore Spark ignores those deals.

You can learn more about this in my recent blog post: https://blog.filstation.app/posts/how-spark-discovers-content-stored-in-fil-deals

@Destore2023, do you know by chance what tooling is used by the clients of f03106356 to store the data and/or why the Label values start with the mAXCg5A prefix? What kind of data is that?

Could you please paste the CID from your screenshots in plaintext so I can run some tests on my side, too?

What I found in my research:

  • It looks like the Label value is a base64-encoded CID (m is the multibase code for base64 encoding.)
  • I can convert the value mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg to CIDv1 bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq.
  • That CID has the same prefix as the CID in your screenshot (bafykbzac). That makes me think I am on the right track here.
  • However, this CID is not found in the IPNI index.

I tested Labels from all f03106356 deals found in my snapshot from July 2nd, converted them from base64 multibase to CIDv1 format. None of them were found in the cid.contact database.

When I paste the CID to the CID Inspector website, I see they are using the blake2b-256 hash, while typically IPFS CIDs use the sha2-256 hash. I am not sure if that's a problem or not.

When I use Lassie to retrieve the CID bafykbz...hinlq from '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' shown in your screenshot, it fails with the error "no candidates".

❯ lassie fetch -o /dev/null -vv --dag-scope block --protocols graphsync --providers '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq
Fetching bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq from specified provider(s)2024-07-04T14:53:48.782+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "started-fetch", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:48.784+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "started-finding-candidates", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:49.525+0200	DEBUG	lassie/retriever	retriever/directcandidatesource.go:133	retrieving metadata from libp2p protocol list	{"peer": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "candidates-found", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "candidates": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "failed", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "errorMessage": "no candidates"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "finished", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}

no candidates

Hello, and thanks for reaching out! 👋🏻

We don't have any retrievable deals in Spark's database for the miner f03106356.

When I look at StateMarketDeals, I see that this miner's deals have DealProposal.Label field set to values starting with mAXCg5A, like mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg. That does not look like a valid CID and therefore Spark ignores those deals.

You can learn more about this in my recent blog post: https://blog.filstation.app/posts/how-spark-discovers-content-stored-in-fil-deals

@Destore2023, do you know by chance what tooling is used by the clients of f03106356 to store the data and/or why the Label values start with the mAXCg5A prefix? What kind of data is that?

Could you please paste the CID from your screenshots in plaintext so I can run some tests on my side, too?

What I found in my research:

  • It looks like the Label value is a base64-encoded CID (m is the multibase code for base64 encoding.)
  • I can convert the value mAXCg5AIglotYbPk/lmvKP4whG+vron6K/Dr4zVCWLD1Dm+dAGrg to CIDv1 bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq.
  • That CID has the same prefix as the CID in your screenshot (bafykbzac). That makes me think I am on the right track here.
  • However, this CID is not found in the IPNI index.

I tested Labels from all f03106356 deals found in my snapshot from July 2nd, converted them from base64 multibase to CIDv1 format. None of them were found in the cid.contact database.

When I paste the CID to the CID Inspector website, I see they are using the blake2b-256 hash, while typically IPFS CIDs use the sha2-256 hash. I am not sure if that's a problem or not.

When I use Lassie to retrieve the CID bafykbz...hinlq from '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' shown in your screenshot, it fails with the error "no candidates".

❯ lassie fetch -o /dev/null -vv --dag-scope block --protocols graphsync --providers '/ip4/103.163.186.80/tcp/16999/p2p/12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4' bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq
Fetching bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq from specified provider(s)2024-07-04T14:53:48.782+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "started-fetch", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:48.784+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "started-finding-candidates", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}
2024-07-04T14:53:49.525+0200	DEBUG	lassie/retriever	retriever/directcandidatesource.go:133	retrieving metadata from libp2p protocol list	{"peer": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "candidates-found", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "candidates": "12D3KooWS72kz7YBp6T7MyCYtuKmvLCzKaWrqG5Sby4c4mUN5en4"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "failed", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": "", "errorMessage": "no candidates"}
2024-07-04T14:53:50.226+0200	DEBUG	lassie/retriever	retriever/retriever.go:291	retrieval-event	{"code": "finished", "rootCid": "bafykbzacecliwwdm7e7zm26kh6gccg7l5orh5cx4hl4m2uewfq6uhg7hianlq", "storageProviderId": ""}

no candidates

@bajtos Hi Miroslav, thank for your reply. I think the SP already fix the problem based on you reply. It works! Thanks for your help.
001

https://spacemeridian.grafana.net/public-dashboards/32c03ae0d89748e3b08e0f08121caa14?orgId=1&from=now-5m&to=now

Hi @filecoin-watchdog @galen-mcandrew , it looks the retrieval data is on the way.

Updates to spark retrieval.
001

Based on a further diligence review, this allocator pathway is partially in compliance with their application

Specifically:

  • Mixed evidence of diligence with clients (minimal verification of client claims, brand new client accounts)
  • Subsequent allocations given despite noncompliant client deal-making, with minimal allocator intervention through comments
  • Clients requesting very large initial amounts, and allocator is overriding, but initial & subsequent allocations do not match tranche schedule
  • SPs are identical across clients
  • Mixed retrievability for datasets, despite claims of public open data by both allocator and client (not showing distributed network data storage utility); there is evidence above of allocator working with Spark to improve retrieval testing

Given this mixed review, we are requesting that the allocator verify that they will uphold all aspects & requirements of their initial application. If so, we will request an additional 5PiB of DataCap from RKH, to allow this allocator to show increased diligence and alignment.

@Destore2023 can you verify that you will enforce program and allocator requirements?
(for example: public diligence, tranche schedules, and public scale retrievability like Spark).

Please reply here with acknowledgement and any additional details for our review.

Dear @galen-mcandrew
Yes, we will continue to request the client to keep having a high retrieval success rate on spark, as with our previous efforts with the spark team. For diligence with clients, we will also continue to use questionnaires, as well as increase the monitoring of comments on the github. We will show increased diligence and alignment.