jacobsa/fuse

1.8x slower compared to go-fuse

navytux opened this issue · 12 comments

Hello up there.

I'm currently using go-fuse library for my filesystem, but from time to time I'm also rechecking on how it goes with jacobsa/fuse. In 2018 jacobsa/fuse was ~1.6x slower (2) compared to hanwen/go-fuse, and today (2020, Mar.11) I've found that hanwen/go-fuse improved its throughput, while jacobsa/fuse stays at approximately the same performance level(*). Today, the ratio in speed is ~3x (5.7 GB/s vs ~ 1.9 GB/s). edit: today the ratio is ~1.8x - see #78 (comment).

Is there anything I'm missing?

I've looked only briefly and could indeed miss something obvious which could be related to performance or generally. In such a case I appologize for creating the noise.

Thanks beforehand for feedback,
Kirill

/cc @jacobsa, @stapelberg

(*) both at current master and even if I try to patch it to handle requests directly instead of in a goroutine (1, 2).

Check out distr1/distri#59 for a list of known performance improvements.

In particular, I think torvalds/linux@5da784c will help with throughput, but I haven’t had the time/motivation to implement it yet.

Is throughput important for your use-case? Would you be interested in contributing it?

@stapelberg, thanks for feedback. Both throughput and latency are important for my case.

The max-pages commit (torvalds/linux@5da784c) is indeed very related, but for now I was benchmarking both go-fuse and jacobsa/fuse without it - on Debian10 4.19 linux kernel where it is not yet present and not yet backported. In other words with max-pages both go-fuse and jacobsa will benefit from raising read buffer from 128K to e.g. 1M, likely preserving, at least qualitatively, performance comparision to each other.

I looked through the list of other planned performance improvements, and none of the seem to be related to my simple benchmark. In other words, what I'm seeing, is that somehow jacobsa/fuse implements basic FUSE requests receive/reply and similar core FUSE protocol operations slower than go-fuse(*). Is that indeed the case?

I, of course, am ready to contribute improvements, like I already did with go-fuse and the kernel where it was needed for my filesystem. However before contributing I need to better understand which library to build on. jacobsa/fuse API is somewhat cleaner, but (imho) the major speed difference neglects this argument.

Thanks again for feedback,
Kirill

P.S. good idea with disti. qemu-runlinux might help you debug why bdi misestimates write throughput you are seeing in Writing to FUSE via mmap extremely slow (sometimes) on some machines? thread. At least it was very handy for me to be able to run test workloads while working on fuse kernel changes in the matter of minutes at each iteration.


(*) I comment inline about every planned improvement and show that it was an equal comparision in between go-fuse and jacobsa/fuse:

  • enable parallel dirops (FUSE_PARALLEL_DIROPS)

Yes, go-fuse has it enabled (2), but the test benchmark is very simple program that reads from only single file. In other words FUSE_PARALLEL_DIROPS should be unrelated to this benchmark.

Judging from distr1/distri@22697ba2dab1 this seems to be more related to caching/invlidation directory entries. However, once again, the benchmark is only about one file and mainly issues many READ requests to that file.

Once again, unrelated to the benchmark.

Unrelated to this benchmark.

Yes, by default, whenever file on FUSE filesystem is opened, the kernel completely drops pagecache of it. It is possible to tell the kernel not to do so via returning FOPEN_KEEP_CACHE flag in response to OPEN request. Even with that, for filesystems that do not need to maintain any state on opened file handles, it is possible to save open round-trips by telling the kernel not to issue OPEN requests via no_open flag. However for this particular benchmark we are interested that the kernel drops data cache on every new open on every iteration to actually benchmark fuse requests processing instead of access to pagecache.

  • return nil with child inode == 0 instead of ENOENT to cache ENOENT:

Should be unrelated to the benchmark, where only one file is used.

This should likely be a good thing to do, but neither go-fuse, nor jacobsa/fuse currently do it.

Thanks for the thorough comparison.

In other words, what I'm seeing, is that somehow jacobsa/fuse implements basic FUSE requests receive/reply and similar core FUSE protocol operations slower than go-fuse(*). Is that indeed the case?

I don’t know for sure, but one thing I noticed in passing is that there was a performance win in not spawning a goroutine per request in

if _, ok := op.(*fuseops.ForgetInodeOp); ok {
// Special case: call in this goroutine for
// forget inode ops, which may come in a
// flurry from the kernel and are generally
// cheap for the file system to handle
s.handleOp(c, ctx, op)
} else {
go s.handleOp(c, ctx, op)
}

P.S. good idea with disti. qemu-runlinux might help you debug why bdi misestimates write throughput you are seeing in Writing to FUSE via mmap extremely slow (sometimes) on some machines? thread. At least it was very handy for me to be able to run test workloads while working on fuse kernel changes in the matter of minutes at each iteration.

Thanks for the pointer! I already have a convenient way to start a distri image with a custom-built kernel and optionally kgdb, but that thread is about a separate, unrelated problem at work which I can’t reproduce in distri :)

@stapelberg, thanks for feedback.

I don’t know for sure, but one thing I noticed in passing is that there was a performance win in not spawning a goroutine per request in

if _, ok := op.(*fuseops.ForgetInodeOp); ok {
// Special case: call in this goroutine for
// forget inode ops, which may come in a
// flurry from the kernel and are generally
// cheap for the file system to handle
s.handleOp(c, ctx, op)
} else {
go s.handleOp(c, ctx, op)
}

Yes, that might be very related in my experience. Go-fuse processes its requests this way, and as I already said this is exactly why I also tried to patch jacobsa/fuse to do it similarly

https://lab.nexedi.com/kirr/jacobsa-fuse/commit/59a13571
https://lab.nexedi.com/kirr/jacobsa-fuse/commit/e891cd16

but unfortunately without any change in observed performance.

Thanks for the pointer! ...

You are welcome.

Ok, I looked into FUSE protocol exchange that is happenning in between kernel and hanwen/jacobsa. First about

hanwen/go-fuse improved its throughput

this turned out to be false due to in-memory filesystem now reporting FOPEN_KEEP_CACHE to kernel https://github.com/hanwen/go-fuse/blob/fe141f38f2b7/fs/mem.go#L32, and so the benchmark was essentially benchmarking access to pagecache (but not only - there were still many GETATTR read requests, see below). I corrected the code to not return FOPEN_KEEP_CACHE so that both hanwen and jacobsa based servers receive fuse read requests on every iteration(*) and the comparision is fair. This way performance of hanwen/go-fuse v2 version became ~ equal to hanwen/go-fuse v1 version which looks reasonable.

Now let us see what is going on with test servers:

jacobsa/fuse benchamarks at ~ 1.7 GB/s:

.../benchmark-read-throughput$ go run readbench.go -bs 128 -limit 30000 ~/mnt/tmp/hello
block size 128 kb: 30006.7 MB in 17.426231462s: 1721.92 MBs/s
jacobsa/fuse fuse trace log
fuse: Op 0x00000001        connection.go:408] <- init
fuse: Op 0x00000001        connection.go:491] -> OK ()
fuse: Op 0x00000002        connection.go:408] <- GetInodeAttributes (inode 1)
fuse: Op 0x00000002        connection.go:491] -> OK ()
fuse: Op 0x00000003        connection.go:408] <- LookUpInode (parent 1, name "hello")
fuse: Op 0x00000003        connection.go:491] -> OK (inode 2)
fuse: Op 0x00000004        connection.go:408] <- GetInodeAttributes (inode 2)
fuse: Op 0x00000004        connection.go:491] -> OK ()
fuse: Op 0x00000005        connection.go:408] <- OpenFile (inode 2)
fuse: Op 0x00000005        connection.go:491] -> OK ()
fuse: Op 0x00000006        connection.go:408] <- unknown (inode 2, opcode 40)
fuse: Op 0x00000006        connection.go:493] -> Error: "function not implemented"
fuse: Op 0x00000007        connection.go:408] <- ReadFile (inode 2, handle 0, offset 0, 131072 bytes)
fuse: Op 0x00000007        connection.go:491] -> OK ()
fuse: Op 0x00000008        connection.go:408] <- ReadFile (inode 2, handle 0, offset 131072, 131072 bytes)
fuse: Op 0x00000008        connection.go:491] -> OK ()
fuse: Op 0x00000009        connection.go:408] <- ReadFile (inode 2, handle 0, offset 262144, 131072 bytes)
fuse: Op 0x00000009        connection.go:491] -> OK ()
fuse: Op 0x0000000a        connection.go:408] <- ReadFile (inode 2, handle 0, offset 393216, 131072 bytes)
fuse: Op 0x0000000a        connection.go:491] -> OK ()
fuse: Op 0x0000000b        connection.go:408] <- ReadFile (inode 2, handle 0, offset 524288, 131072 bytes)
fuse: Op 0x0000000b        connection.go:491] -> OK ()
fuse: Op 0x0000000c        connection.go:408] <- ReadFile (inode 2, handle 0, offset 655360, 131072 bytes)
fuse: Op 0x0000000c        connection.go:491] -> OK ()
fuse: Op 0x0000000d        connection.go:408] <- ReadFile (inode 2, handle 0, offset 786432, 131072 bytes)
fuse: Op 0x0000000d        connection.go:491] -> OK ()
fuse: Op 0x0000000e        connection.go:408] <- ReadFile (inode 2, handle 0, offset 917504, 131072 bytes)
fuse: Op 0x0000000e        connection.go:491] -> OK ()
fuse: Op 0x0000000f        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1048576, 131072 bytes)
fuse: Op 0x0000000f        connection.go:491] -> OK ()
fuse: Op 0x00000010        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1179648, 131072 bytes)
fuse: Op 0x00000010        connection.go:491] -> OK ()
fuse: Op 0x00000011        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1310720, 131072 bytes)
fuse: Op 0x00000011        connection.go:491] -> OK ()
fuse: Op 0x00000012        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1441792, 131072 bytes)
fuse: Op 0x00000012        connection.go:491] -> OK ()
fuse: Op 0x00000013        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1572864, 131072 bytes)
fuse: Op 0x00000013        connection.go:491] -> OK ()
fuse: Op 0x00000014        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1703936, 131072 bytes)
fuse: Op 0x00000014        connection.go:491] -> OK ()
fuse: Op 0x00000015        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1835008, 131072 bytes)
fuse: Op 0x00000015        connection.go:491] -> OK ()
fuse: Op 0x00000016        connection.go:408] <- ReadFile (inode 2, handle 0, offset 1966080, 131072 bytes)
...
fuse: Op 0x00000069        connection.go:408] <- ReadFile (inode 2, handle 0, offset 12845056, 131072 bytes)
fuse: Op 0x00000069        connection.go:491] -> OK ()
fuse: Op 0x0000006a        connection.go:408] <- ReadFile (inode 2, handle 0, offset 12976128, 131072 bytes)
fuse: Op 0x0000006a        connection.go:491] -> OK ()
fuse: Op 0x0000006b        connection.go:408] <- GetInodeAttributes (inode 2)
fuse: Op 0x0000006b        connection.go:491] -> OK ()
fuse: Op 0x0000006c        connection.go:408] <- FlushFile (inode 2)
fuse: Op 0x0000006c        connection.go:493] -> Error: "function not implemented"
fuse: Op 0x0000006d        connection.go:408] <- ReleaseFileHandle
fuse: Op 0x0000006d        connection.go:493] -> Error: "function not implemented"
fuse: Op 0x0000006e        connection.go:408] <- GetInodeAttributes (inode 1)
fuse: Op 0x0000006e        connection.go:491] -> OK ()
fuse: Op 0x0000006f        connection.go:408] <- LookUpInode (parent 1, name "hello")
fuse: Op 0x0000006f        connection.go:491] -> OK (inode 2)
fuse: Op 0x00000070        connection.go:408] <- GetInodeAttributes (inode 2)
fuse: Op 0x00000070        connection.go:491] -> OK ()
fuse: Op 0x00000071        connection.go:408] <- OpenFile (inode 2)
fuse: Op 0x00000071        connection.go:491] -> OK ()
fuse: Op 0x00000072        connection.go:408] <- ReadFile (inode 2, handle 0, offset 0, 131072 bytes)
fuse: Op 0x00000072        connection.go:491] -> OK ()
fuse: Op 0x00000073        connection.go:408] <- ReadFile (inode 2, handle 0, offset 131072, 131072 bytes)
fuse: Op 0x00000073        connection.go:491] -> OK ()
fuse: Op 0x00000074        connection.go:408] <- ReadFile (inode 2, handle 0, offset 262144, 131072 bytes)
fuse: Op 0x00000074        connection.go:491] -> OK ()
fuse: Op 0x00000075        connection.go:408] <- ReadFile (inode 2, handle 0, offset 393216, 131072 bytes)
fuse: Op 0x00000075        connection.go:491] -> OK ()
fuse: Op 0x00000076        connection.go:408] <- ReadFile (inode 2, handle 0, offset 524288, 131072 bytes)
fuse: Op 0x00000076        connection.go:491] -> OK ()
fuse: Op 0x00000077        connection.go:408] <- ReadFile (inode 2, handle 0, offset 655360, 131072 bytes)
fuse: Op 0x00000077        connection.go:491] -> OK ()
fuse: Op 0x00000078        connection.go:408] <- ReadFile (inode 2, handle 0, offset 786432, 131072 bytes)
fuse: Op 0x00000078        connection.go:491] -> OK ()
...

The kernel issues 128K read requests to the file and open request once in a while after file was completely read. No unexpected requests here.

hanwen/go-fuse benchmarks at 2.7 GB/s:

.../benchmark-read-throughput$ go run readbench.go -bs 128 -limit 30000 ~/mnt/tmp/file.txt
block size 128 kb: 30006.7 MB in 11.034084738s: 2719.45 MBs/s
hanwen/go-fuse trace log
2020/03/12 16:31:21 rx 1: INIT i0 {7.27 Ra 0x20000 ATOMIC_O_TRUNC,SPLICE_READ,IOCTL_DIR,WRITEBACK_CACHE,POSIX_ACL,ABORT_ERROR,DONT_MASK,SPLICE_WRITE,SPLICE_MOVE,FLOCK_LOCKS,READDIRPLUS,READDIRPLUS_AUTO,ASYNC_DIO,PARALLEL_DIROPS,ASYNC_READ,POSIX_LOCKS,EXPORT_SUPPORT,AUTO_INVAL_DATA,HANDLE_KILLPRIV,BIG_WRITES,NO_OPEN_SUPPORT}
2020/03/12 16:31:21 tx 1:     OK, {7.27 Ra 0x20000 PARALLEL_DIROPS,ASYNC_READ,AUTO_INVAL_DATA,READDIRPLUS,BIG_WRITES,NO_OPEN_SUPPORT 0/0 Wr 0x10000 Tg 0x0}
2020/03/12 16:31:21 rx 2: LOOKUP i1 [".go-fuse-epoll-hack"] 20b
2020/03/12 16:31:21 tx 2:     2=no such file or directory, {i0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:21 rx 3: CREATE i1 {0100100 [WRONLY,CREAT,TRUNC,0x8000] (022)} [".go-fuse-epoll-hack"] 20b
2020/03/12 16:31:21 tx 3:     OK, {i18446744073709551615 g0 {M0100644 SZ=0 L=1 0:0 B0*0 i0:18446744073709551615 A 0.000000 M 0.000000 C 0.000000} &{18446744073709551615 0 0}}
2020/03/12 16:31:21 rx 4: POLL i18446744073709551615
2020/03/12 16:31:21 tx 4:     38=function not implemented
2020/03/12 16:31:21 rx 5: FLUSH i18446744073709551615 {Fh 18446744073709551615}
2020/03/12 16:31:21 tx 5:     34=numerical result out of range
2020/03/12 16:31:21 rx 6: RELEASE i18446744073709551615 {Fh 18446744073709551615 WRONLY,0x8000  L0}
2020/03/12 16:31:21 tx 6:     34=numerical result out of range
2020/03/12 16:31:28 rx 7: LOOKUP i1 ["file.txt"] 9b
2020/03/12 16:31:28 tx 7:     OK, {i2 g0 tE=0s tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 8: OPEN i2 {O_RDONLY,0x8000}
2020/03/12 16:31:28 tx 8:     OK, {Fh 0 }
2020/03/12 16:31:28 rx 9: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 9:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 10: READ i2 {Fh 0 [0 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 10:     OK,  131072b data "\x1f\x8b\b\x00\x81\x95C^"...
2020/03/12 16:31:28 rx 11: READ i2 {Fh 0 [131072 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 11:     OK,  131072b data "\x8ex\xe5\xed\x9d\xed\xc1o"...
2020/03/12 16:31:28 rx 12: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 12:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 13: READ i2 {Fh 0 [262144 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 13:     OK,  131072b data "0/\xe6\xc0\x8e\xe6Uy"...
2020/03/12 16:31:28 rx 14: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 14:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 15: READ i2 {Fh 0 [393216 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 15:     OK,  131072b data "\xf5\xe3\xc2Έ\xcfx\x8d"...
2020/03/12 16:31:28 rx 16: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 16:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 17: READ i2 {Fh 0 [524288 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 17:     OK,  131072b data "\"\x0fUQ\xe5D\xf7F"...
2020/03/12 16:31:28 rx 18: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 18:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 19: READ i2 {Fh 0 [655360 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 19:     OK,  131072b data "-\b\x04E#\x8az\xc4"...
2020/03/12 16:31:28 rx 20: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 20:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 21: READ i2 {Fh 0 [786432 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 21:     OK,  131072b data "\xff\xa3b\x12\xa2\xa4i3"...
2020/03/12 16:31:28 rx 22: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 22:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 23: READ i2 {Fh 0 [917504 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 23:     OK,  131072b data "\x03_\x8f\xf4\xdcM\xed\xd9"...
...
2020/03/12 16:31:28 rx 204: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 204:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 205: READ i2 {Fh 0 [12845056 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 205:     OK,  131072b data "\xd3\x05\xbd2\xf7\x19\xf0("...
2020/03/12 16:31:28 rx 206: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 206:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 207: READ i2 {Fh 0 [12976128 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 207:     OK,  128523b data "\x96H!\x84\xa4\xb7\xae\xbb"...
2020/03/12 16:31:28 rx 208: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 208:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 209: FLUSH i2 {Fh 0}
2020/03/12 16:31:28 tx 209:     OK
2020/03/12 16:31:28 rx 210: RELEASE i2 {Fh 0 NONBLOCK,0x8000  L0}
2020/03/12 16:31:28 tx 210:     OK
2020/03/12 16:31:28 rx 211: LOOKUP i1 ["file.txt"] 9b
2020/03/12 16:31:28 tx 211:     OK, {i2 g0 tE=0s tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 212: OPEN i2 {O_RDONLY,0x8000}
2020/03/12 16:31:28 tx 212:     OK, {Fh 0 }
2020/03/12 16:31:28 rx 213: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 213:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 214: READ i2 {Fh 0 [0 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 214:     OK,  131072b data "\x1f\x8b\b\x00\x81\x95C^"...
2020/03/12 16:31:28 rx 215: READ i2 {Fh 0 [131072 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 215:     OK,  131072b data "\x8ex\xe5\xed\x9d\xed\xc1o"...
2020/03/12 16:31:28 rx 216: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 216:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 217: READ i2 {Fh 0 [262144 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 217:     OK,  131072b data "0/\xe6\xc0\x8e\xe6Uy"...
2020/03/12 16:31:28 rx 218: GETATTR i2 {Fh 0}
2020/03/12 16:31:28 tx 218:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:31:28 rx 219: READ i2 {Fh 0 [393216 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:31:28 tx 219:     OK,  131072b data "\xf5\xe3\xc2Έ\xcfx\x8d"...
...

Here, like with jacobsa/fuse, the kernel issues 128K read requests to the file and open request once in a while after file was completely read. However unlike jacobsa/fuse there are many extra GETATTR requests beause the kernel rechecks whether file size was changed due to AUTO_INVAL_DATA passed to it in init.

If I tell the kernel to trust the program to invalidate its cache explicitly(+), flood of GETATTR requests is gone and the server starts to benchmark at 3.2 GB/s:

.../benchmark-read-throughput$ go run readbench.go -bs 128 -limit 30000 ~/mnt/tmp/file.txt
block size 128 kb: 30006.7 MB in 9.465798267s: 3170.01 MBs/s
hanwen/go-fuse trace log with AUTO_INVAL_DATA deactivated
2020/03/12 16:10:02 rx 1: INIT i0 {7.27 Ra 0x20000 ATOMIC_O_TRUNC,READDIRPLUS_AUTO,ASYNC_DIO,PARALLEL_DIROPS,ABORT_ERROR,BIG_WRITES,DONT_MASK,SPLICE_WRITE,SPLICE_READ,POSIX_ACL,FLOCK_LOCKS,READDIRPLUS,WRITEBACK_CACHE,NO_OPEN_SUPPORT,HANDLE_KILLPRIV,IOCTL_DIR,AUTO_INVAL_DATA,ASYNC_READ,POSIX_LOCKS,EXPORT_SUPPORT,SPLICE_MOVE}
2020/03/12 16:10:02 tx 1:     OK, {7.27 Ra 0x20000 ASYNC_READ,PARALLEL_DIROPS,BIG_WRITES,NO_OPEN_SUPPORT,READDIRPLUS 0/0 Wr 0x10000 Tg 0x0}
2020/03/12 16:10:02 rx 2: LOOKUP i1 [".go-fuse-epoll-hack"] 20b
2020/03/12 16:10:02 tx 2:     2=no such file or directory, {i0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:10:02 rx 3: CREATE i1 {0100100 [TRUNC,WRONLY,CREAT,0x8000] (022)} [".go-fuse-epoll-hack"] 20b
2020/03/12 16:10:02 tx 3:     OK, {i18446744073709551615 g0 {M0100644 SZ=0 L=1 0:0 B0*0 i0:18446744073709551615 A 0.000000 M 0.000000 C 0.000000} &{18446744073709551615 0 0}}
2020/03/12 16:10:02 rx 4: POLL i18446744073709551615
2020/03/12 16:10:02 tx 4:     38=function not implemented
2020/03/12 16:10:02 rx 5: FLUSH i18446744073709551615 {Fh 18446744073709551615}
2020/03/12 16:10:02 tx 5:     34=numerical result out of range
2020/03/12 16:10:02 rx 6: RELEASE i18446744073709551615 {Fh 18446744073709551615 WRONLY,0x8000  L0}
2020/03/12 16:10:02 tx 6:     34=numerical result out of range
2020/03/12 16:10:05 rx 7: LOOKUP i1 ["file.txt"] 9b
2020/03/12 16:10:05 tx 7:     OK, {i2 g0 tE=0s tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:10:05 rx 8: OPEN i2 {O_RDONLY,0x8000}
2020/03/12 16:10:05 tx 8:     OK, {Fh 0 }
2020/03/12 16:10:05 rx 9: READ i2 {Fh 0 [0 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 9:     OK,  131072b data "\x1f\x8b\b\x00\x81\x95C^"...
2020/03/12 16:10:05 rx 10: READ i2 {Fh 0 [131072 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 10:     OK,  131072b data "\x8ex\xe5\xed\x9d\xed\xc1o"...
2020/03/12 16:10:05 rx 11: READ i2 {Fh 0 [262144 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 11:     OK,  131072b data "0/\xe6\xc0\x8e\xe6Uy"...
2020/03/12 16:10:05 rx 12: READ i2 {Fh 0 [393216 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 12:     OK,  131072b data "\xf5\xe3\xc2Έ\xcfx\x8d"...
2020/03/12 16:10:05 rx 13: READ i2 {Fh 0 [524288 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 13:     OK,  131072b data "\"\x0fUQ\xe5D\xf7F"...
2020/03/12 16:10:05 rx 14: READ i2 {Fh 0 [655360 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 14:     OK,  131072b data "-\b\x04E#\x8az\xc4"...
2020/03/12 16:10:05 rx 15: READ i2 {Fh 0 [786432 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 15:     OK,  131072b data "\xff\xa3b\x12\xa2\xa4i3"...
2020/03/12 16:10:05 rx 16: READ i2 {Fh 0 [917504 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 16:     OK,  131072b data "\x03_\x8f\xf4\xdcM\xed\xd9"...
2020/03/12 16:10:05 rx 17: READ i2 {Fh 0 [1048576 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 17:     OK,  131072b data "\n\xb6\n\xbbjn\x98\xf1"...
2020/03/12 16:10:05 rx 18: READ i2 {Fh 0 [1179648 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 18:     OK,  131072b data "W%\xa5'G\xe5\x8f\xd3"...
2020/03/12 16:10:05 rx 19: READ i2 {Fh 0 [1310720 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 19:     OK,  131072b data "\xcfnƲp\xacz\xe4"...
2020/03/12 16:10:05 rx 20: READ i2 {Fh 0 [1441792 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 20:     OK,  131072b data "x\x06\xb7\x03|\"\xb7\x03"...
2020/03/12 16:10:05 rx 21: READ i2 {Fh 0 [1572864 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 21:     OK,  131072b data "\r\xf1Ib]\a\xe2\x85"...
2020/03/12 16:10:05 rx 22: READ i2 {Fh 0 [1703936 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 22:     OK,  131072b data "\xe1\xbcZ\xa8\xbd\xfb\xe8z"...
2020/03/12 16:10:05 rx 23: READ i2 {Fh 0 [1835008 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 23:     OK,  131072b data "\xb2\b/\a>\x8a\xf0T"...
2020/03/12 16:10:05 rx 24: READ i2 {Fh 0 [1966080 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 24:     OK,  131072b data "|\b/\x94\x1f\u008b\xe4"...
2020/03/12 16:10:05 rx 25: READ i2 {Fh 0 [2097152 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 25:     OK,  131072b data "\x15\xdaL\x1c\x1fd\\\x98"...
2020/03/12 16:10:05 rx 26: READ i2 {Fh 0 [2228224 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 26:     OK,  131072b data "s\x9f\xf2P\xb7\x13\xecd"...
2020/03/12 16:10:05 rx 27: READ i2 {Fh 0 [2359296 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 27:     OK,  131072b data "\x9b\xca\rս^9|"...
2020/03/12 16:10:05 rx 28: READ i2 {Fh 0 [2490368 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 28:     OK,  131072b data "s\x9c7sFU\xa4i"...
2020/03/12 16:10:05 rx 29: READ i2 {Fh 0 [2621440 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 29:     OK,  131072b data "\xed\x12\"rV\v\xf9\u007f"...
2020/03/12 16:10:05 rx 30: READ i2 {Fh 0 [2752512 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 30:     OK,  131072b data "\xb4PZ^i\x8b\x89\xff"...
2020/03/12 16:10:05 rx 31: READ i2 {Fh 0 [2883584 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 31:     OK,  131072b data ">%\xbf\a\xcf\xca\xef\xc7"...
2020/03/12 16:10:05 rx 32: READ i2 {Fh 0 [3014656 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 32:     OK,  131072b data "\xb6\xaf\xb97i\xf56\xd6"...
2020/03/12 16:10:05 rx 33: READ i2 {Fh 0 [3145728 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 33:     OK,  131072b data "\xf8\x8b\x11\xff0\xf5\x17\xe2"...
2020/03/12 16:10:05 rx 34: READ i2 {Fh 0 [3276800 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 34:     OK,  131072b data "\b\xde\x166iU^\u007f"...
2020/03/12 16:10:05 rx 35: READ i2 {Fh 0 [3407872 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 35:     OK,  131072b data "O8\xe3\xcfg\xfee\xf8"...
...
2020/03/12 16:10:05 rx 104: READ i2 {Fh 0 [12451840 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 104:     OK,  131072b data "\x8aߤ\xb1-\xe0\u007f\x1f"...
2020/03/12 16:10:05 rx 105: READ i2 {Fh 0 [12582912 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 105:     OK,  131072b data "\xa2\xeb \xa2s[\xe3\x88"...
2020/03/12 16:10:05 rx 106: READ i2 {Fh 0 [12713984 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 106:     OK,  131072b data "b\xa9\xe5\xef\xd3\xf2\xaf8"...
2020/03/12 16:10:05 rx 107: READ i2 {Fh 0 [12845056 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 107:     OK,  131072b data "\xd3\x05\xbd2\xf7\x19\xf0("...
2020/03/12 16:10:05 rx 108: READ i2 {Fh 0 [12976128 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 108:     OK,  128523b data "\x96H!\x84\xa4\xb7\xae\xbb"...
2020/03/12 16:10:05 rx 109: GETATTR i2 {Fh 0}
2020/03/12 16:10:05 tx 109:     OK, {tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:10:05 rx 110: FLUSH i2 {Fh 0}
2020/03/12 16:10:05 tx 110:     OK
2020/03/12 16:10:05 rx 111: RELEASE i2 {Fh 0 NONBLOCK,0x8000  L0}
2020/03/12 16:10:05 tx 111:     OK
2020/03/12 16:10:05 rx 112: LOOKUP i1 ["file.txt"] 9b
2020/03/12 16:10:05 tx 112:     OK, {i2 g0 tE=0s tA=0s {M0100644 SZ=13104651 L=0 0:0 B25600*4096 i0:2 A 0.000000 M 0.000000 C 0.000000}}
2020/03/12 16:10:05 rx 113: OPEN i2 {O_RDONLY,0x8000}
2020/03/12 16:10:05 tx 113:     OK, {Fh 0 }
2020/03/12 16:10:05 rx 114: READ i2 {Fh 0 [0 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 114:     OK,  131072b data "\x1f\x8b\b\x00\x81\x95C^"...
2020/03/12 16:10:05 rx 115: READ i2 {Fh 0 [131072 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 115:     OK,  131072b data "\x8ex\xe5\xed\x9d\xed\xc1o"...
2020/03/12 16:10:05 rx 116: READ i2 {Fh 0 [262144 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 116:     OK,  131072b data "0/\xe6\xc0\x8e\xe6Uy"...
2020/03/12 16:10:05 rx 117: READ i2 {Fh 0 [393216 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 117:     OK,  131072b data "\xf5\xe3\xc2Έ\xcfx\x8d"...
2020/03/12 16:10:05 rx 118: READ i2 {Fh 0 [524288 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 118:     OK,  131072b data "\"\x0fUQ\xe5D\xf7F"...
2020/03/12 16:10:05 rx 119: READ i2 {Fh 0 [655360 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 119:     OK,  131072b data "-\b\x04E#\x8az\xc4"...
2020/03/12 16:10:05 rx 120: READ i2 {Fh 0 [786432 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 120:     OK,  131072b data "\xff\xa3b\x12\xa2\xa4i3"...
2020/03/12 16:10:05 rx 121: READ i2 {Fh 0 [917504 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 121:     OK,  131072b data "\x03_\x8f\xf4\xdcM\xed\xd9"...
2020/03/12 16:10:05 rx 122: READ i2 {Fh 0 [1048576 +131072)  L 0 NONBLOCK,0x8000}
2020/03/12 16:10:05 tx 122:     OK,  131072b data "\n\xb6\n\xbbjn\x98\xf1"...
...

In other words what we see is that under more or less equal client conditions jacobsa/fuse performs ~1.8x (1.7GB/s vs 3.2GB/s) slower compared to hanwen/go-fuse.

Am I missing something?

Kirill


(*)

--- a/fs/mem.go
+++ b/fs/mem.go
@@ -29,7 +29,8 @@ var _ = (NodeSetattrer)((*MemRegularFile)(nil))
 var _ = (NodeFlusher)((*MemRegularFile)(nil))
 
 func (f *MemRegularFile) Open(ctx context.Context, flags uint32) (fh FileHandle, fuseFlags uint32, errno syscall.Errno) {
-       return nil, fuse.FOPEN_KEEP_CACHE, OK
+       //return nil, fuse.FOPEN_KEEP_CACHE, OK
+       return nil, 0, OK
 }

(+)

--- a/example/hello/main.go
+++ b/example/hello/main.go
@@ -46,6 +58,7 @@ func main() {
                log.Fatal("Usage:\n  hello MOUNTPOINT")
        }
        opts := &fs.Options{}
+       opts.ExplicitDataCacheControl = true
        opts.Debug = *debug
        server, err := fs.Mount(flag.Arg(0), &HelloRoot{}, opts)
        if err != nil {

My next guess would be that jacobsa/fuse’s lack of splicing and writev might make the difference.

Can you try disabling them in hanwen/go-fuse and see if the speeds match?

Yes, sure, I could try. Splicing should be unrelated to this benchmark because it only splices data from another file descriptor, not memory buffer we are serving. As expected, the following patch does not make any difference:

--- a/fuse/splice_linux.go
+++ b/fuse/splice_linux.go
@@ -12,7 +12,8 @@ import (
 )
 
 func (s *Server) setSplice() {
-       s.canSplice = splice.Resizable()
+       s.canSplice = false;
+       //s.canSplice = splice.Resizable()
 }
 
 // trySplice:  Zero-copy read from fdData.Fd into /dev/fuse
@@ -29,6 +30,7 @@ func (s *Server) setSplice() {
 // This dance is neccessary because header and payload cannot be split across
 // two splices and we cannot seek in a pipe buffer.
 func (ms *Server) trySplice(header []byte, req *request, fdData *readResultFd) error {
+       panic("should not be called")
        var err error
 
        // Get a pair of connected pipes

Lack of writev support does add some overhead, but go-fuse is still far faster than jacobsa/fuse:

--- a/fuse/server_linux.go
+++ b/fuse/server_linux.go
@@ -9,6 +9,8 @@ import (
        "syscall"
 )
 
+var mem bufferPool
+
 func (ms *Server) systemWrite(req *request, header []byte) Status {
        if req.flatDataSize() == 0 {
                err := handleEINTR(func() error {
@@ -34,7 +36,14 @@ func (ms *Server) systemWrite(req *request, header []byte) Status {
                header = req.serializeHeader(len(req.flatData))
        }
 
-       _, err := writev(ms.mountFd, [][]byte{header, req.flatData})
+       //_, err := writev(ms.mountFd, [][]byte{header, req.flatData})
+
+       msg := mem.AllocBuffer(uint32(len(header) + len(req.flatData)))
+       defer mem.FreeBuffer(msg)
+       copy(msg, header)
+       copy(msg[len(header):], req.flatData)
+       _, err := syscall.Write(ms.mountFd, msg)
+
        if req.readResult != nil {
                req.readResult.Done()
        }
.../benchmark-read-throughput$ md5sum ~/mnt/tmp/file.txt 
a0dbdd21ee5c5d0cf9c08c8a56e2d042  /home/kirr/mnt/tmp/file.txt    # same as it was before

.../benchmark-read-throughput$ go run readbench.go -bs 128 -limit 30000 ~/mnt/tmp/file.txt
block size 128 kb: 30006.7 MB in 10.305543101s: 2911.70 MBs/s

Whatever it is, I could indeed continue to dig and find the root cause(es) of the slowness myself.
However I'm also interested if there is feedback from jacobsa/fuse maintainers to backup the project.

Kirill

However I'm also interested if there is feedback from jacobsa/fuse maintainers to backup the project.

I think Aaron has moved on, but I’m keeping this package maintained as best as time permits. I’m using it in distri.

If you could indeed identify the slowness and contribute a fix, that’d be great!

@stapelberg, thanks for feedback; I see. I think I could probably indeed have a deeper look at jacobsa/fuse, when updating base library of my filesystem, but unfortunately not now.

Cool. I’m not in a hurry, so feel free to come back with more questions/improvements when the time is right.

@navytux :

@stapelberg, thanks for feedback; I see. I think I could probably indeed have a deeper look at jacobsa/fuse, when updating base library of my filesystem, but unfortunately not now.

Is looking into this still a possiblity?

@paulwratt, thanks for asking. My application(*) currently uses go-fuse, and there is no neither short-, nor middle-term plans to rework it.

(*) https://lab.nexedi.com/nexedi/wendelin.core/blob/61dc1ff2a1ad/wcfs/wcfs.go