filecoin-project/curio

Curio logs shows sealing disk errors

stuberman opened this issue · 2 comments

Sealing disk is NVMe using EXT4, OS is Ubuntu server 22.04.4 LTS (GNU/Linux 5.15.0-105-generic x86_64)

2024-05-28T02:00:54.037Z ERROR cu/ffi ffi/sdr_funcs.go:237 reflink treed -> sealed failed, falling back to slow copy, use single scratch btrfs or xfs filesystem {"error": "reflink is not supported on this OS or file", "sector": {"ID":{"Miner":116748,"Number":1},"ProofType":8}, "cache": "/seal/cache/s-t0116748-1", "sealed": "/seal/sealed/s-t0116748-1"}

2024-05-28T01:34:19.376Z DEBUG stores paths/local.go:106 accounting existing files {"id": {"Miner":116748,"Number":1}, "fileType": "cache", "path": "/seal/cache/s-t0116748-1", "used": 309237682176, "overhead": 484472310988}
2024-05-28T01:34:29.452Z DEBUG stores paths/local.go:106 accounting existing files {"id": {"Miner":116748,"Number":1}, "fileType": "cache", "path": "/seal/cache/s-t0116748-1", "used": 309237682176, "overhead": 484472310988}
2024-05-28T01:34:39.456Z DEBUG stores paths/local.go:106 accounting existing files {"id": {"Miner":116748,"Number":1}, "fileType": "cache", "path": "/seal/cache/s-t0116748-1", "used": 309237682176, "overhead": 484472310988}

PC1 and PC2 succeeding:
ls -Alh /seal/cache/s-t0116748-1

total 421G
-rw-rw-r-- 1 stuart stuart 32G May 28 01:40 sc-02-data-layer-10.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 01:59 sc-02-data-layer-11.dat
-rw-rw-r-- 1 stuart stuart 32G May 27 22:51 sc-02-data-layer-1.dat
-rw-rw-r-- 1 stuart stuart 32G May 27 23:09 sc-02-data-layer-2.dat
-rw-rw-r-- 1 stuart stuart 32G May 27 23:28 sc-02-data-layer-3.dat
-rw-rw-r-- 1 stuart stuart 32G May 27 23:47 sc-02-data-layer-4.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 00:06 sc-02-data-layer-5.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 00:24 sc-02-data-layer-6.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 00:43 sc-02-data-layer-7.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 01:02 sc-02-data-layer-8.dat
-rw-rw-r-- 1 stuart stuart 32G May 28 01:21 sc-02-data-layer-9.dat
-rw-rw-r-- 1 stuart stuart 4.6G May 28 02:07 sc-02-data-tree-c-0.dat
-rw-rw-r-- 1 stuart stuart 64G May 28 02:00 sc-02-data-tree-d.dat

2024-05-28T01:34:19.376Z DEBUG stores paths/local.go:106 accounting existing files {"id": {"Miner":116748,"Number":1}, "fileType": "cache", "path": "/seal/cache/s-t0116748-1", "used": 309237682176, "overhead": 484472310988}

Those are just debug logs, fine to ignore; Do you have GOLOG_LOG_LEVEL=debug set in your environment

2024-05-28T02:00:54.037Z ERROR cu/ffi ffi/sdr_funcs.go:237 reflink treed -> sealed failed, falling back to slow copy, use single scratch btrfs or xfs filesystem {"error": "reflink is not supported on this OS or file", "sector": {"ID":{"Miner":116748,"Number":1},"ProofType":8}, "cache": "/seal/cache/s-t0116748-1", "sealed": "/seal/sealed/s-t0116748-1"}

This log is correct, and everything will still work, but will make TreeR compute slower.

In lotus the PC1/2 path wasn't really optimized, in curio we made it much smarter.

Lotus:

  • Created "Unsealed" file with raw deal data
  • PC1 then
    • Copied the "Unsealed" data to "Sealed" sector file (the sealed file was exactly the same as unsealed!)
    • Computed TreeD, stored that in cache
    • SDR Created the 11 layer files in cache; Note that this doesn't require unsealed data, just CommD
  • PC2 then transforms the "Sealed" (at this with unsealed data) file into the real replica file

Note that there are multiple sub-optimal things here:

  • PC1 doesn't need to store unsealed data on disk (32G of storage wasted on pc1 nodes)
  • PC1 doesn't need to compute TreeD (64G of storage wasted on pc1 nodes)
  • PC1<->PC2 shouldn't need to transfer unsealed data twice + TreeD (128G of bw from PC1 nodes wasted, 96G on PC2 nodes)
  • PC1 should be able to start even before data is fetched from the client.

All that is fixed in Curio

  • Curio can start SDR as soon as it has CommP
  • Curio can fetch pieces directly from https endpoints, they technically don't need to go through the market node
    • Tho boost can't take advantage of this, at least not yet. Custom / curio-native market impls can do that.
  • After SDR is done, Curio will schedule TreeD + layer fetch on a GPU node. TreeD sc-02-data-tree-d.dat contains exactly the same data in the first 32G of the file as the "unsealed" sector file
  • Now after TreeD, Curio will schedule TreeRC compute on the same node which just did TreeD. The reason that TreeD+fetch is split from TreeRC is that the first step doesn't use a GPU, so this way we can get to almost full GPU utilization while keeping relatively simple scheduling logic.
    • The reason why you see the reflink ERROR log is that TreeR call we do to rust-fil-proofs expects the 'Sealed' file to already exist on-disk and to contain Unsealed data which it will transform into a valid replica. This is not optimal, but for now the best thing we can do is to use reflink to "copy" the first 32G of the TreeD file into a virtual "unsealed" file. The main limitation of reflink is that it only works on BTRFS or XFS
      • If you use a filesystem without reflink, curio will fallback to a simple file copy, which is what lotus-miner did, but that is unnecessarily slow.
      • The short version is that on TreeRC nodes you should use XFS as the filesystem for sealing space

(Curio also constructs the finalized "Unsealed" sector in the finalize step by truncating the TreeD file)

Some action items here:

  • Warn in storage attach when a --seal filesystem isn't XFS (or BTRFS, tho that one is likely not great for scratch space)
  • Could also raise an alert in the alert manager (tho this won't always be actionable, so maybe not the place to do that)

Close as this is intended behaviour.