kahing/catfs

Invalid 8 Byte Files, Frequent Hangs of Applications, ERROR - "path/file.something" is not a valid cache file, deleting

GeorgiaM-honestly opened this issue · 1 comments

Hello,

I love catfs. I'm using it to provide caching for a goofys s3 "mount". It makes a huge difference!

I do however have some nagging issues which happen on a reglar basis. I hope I have provided enough information and would like to help resolve this if it's a non-local specific issue.

System:

  • Gentoo linux
  • uname -a: Linux deleted 5.12.13-gentoo-x86_64 #1 SMP Thu Jun 24 12:24:13 DELETED 2021 x86_64 Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz GenuineIntel GNU/Linux
  • 16 cores
  • 32GB RAM
  • Filesystem which the cache is using: 100GB, EXT4-fs (sde1): mounted filesystem with ordered data mode. Opts: errors=remount-ro,user_xattr. Quota mode: disabled.

catfs issues:

  • When viewing the data of a directory full of .tif files, the first one requested is always 8 bytes, which appears to be the bare minimum of data to get 'file' to report it is a tif image.

  • Indeed the first 8 bytes of real source file is the same, however, it is far larger than 8 bytes.

  • When viewing the data of a directory full of .tif files, first request or not, screenful of errors are printed out by catfs. Examples:

2021-06-30 16:53:22 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0902830w295700n.tif" is not a valid cache file, deleting
2021-06-30 16:53:23 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0902830w295830n.tif" is not a valid cache file, deleting
2021-06-30 16:53:23 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0902830w300000n.tif" is not a valid cache file, deleting
2021-06-30 16:53:24 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0902830w300130n.tif" is not a valid cache file, deleting
2021-06-30 16:53:24 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0903000w295530n.tif" is not a valid cache file, deleting
2021-06-30 16:53:25 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0903000w295700n.tif" is not a valid cache file, deleting
2021-06-30 16:53:25 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0903000w295830n.tif" is not a valid cache file, deleting
2021-06-30 16:53:26 ERROR - "2005_Hurricane_Katrina/aug31JpegTiles_GCS_NAD83/aug31C0903000w300000n.tif" is not a valid cache file, deleting

  • During this time, which can be several minutes, the application trying to use the data is hung and usually needs force killed.

  • While catfs is "deleting" files, the total size of the cache directory root grows

  • After these few minutes, everything works, except for that first invalid 8 byte file.

  • Often but not always after this delay, despite having fetched files before, they are fetched new again (increased inbound network traffic is observed)

  • Gwenview STDERR on the 8 byte file:
    gwenview.libtiff: Error JPEGLib "Not a JPEG file: starts with 0xd5 0x7e"
    org.kde.kdegraphics.gwenview.lib: Could not generate thumbnail for file "file:///media/s3-noaa-eri-pds-LOCAL-catfs/2005_Hurricane_Katrina/sep02JpegTiles_GCS_NAD83/sep02C0890130w290900n.tif"
    gwenview.libtiff: Error JPEGLib "Not a JPEG file: starts with 0xd5 0x7e"
    org.kde.kdegraphics.gwenview.lib: Could not generate thumbnail for file "file:///media/s3-noaa-eri-pds-LOCAL-catfs/2005_Hurricane_Katrina/sep02JpegTiles_GCS_NAD83/sep02C0890130w290900n.tif"

  • Sometimes after all the other images have been downloaded and thumbnails created, I can double-click on that first file that had been 8 bytes, and it's now downloaded, and will display without error. Other times, it stays 8 bytes.

  • Reproduction

  • Mount an aws s3 public bucket with goofys: goofys -o allow_other -o ro noaa-eri-pds /media/s3-noaa-eri-pds-LOCAL

  • Start catfs appropriately: catfs -o ro --free 1G /media/s3-noaa-eri-pds-LOCAL /home/myusername/.goofys-cache /media/s3-noaa-eri-pds-LOCAL-catfs

  • Explore the images with a tool such as gwenview that will show thumbnails

  • Optional: Start gwenview on the command line, outputting STDOUT and STDERR to their own files and then watch these files: gwenview 1> /tmp/gwenview-STDOUT.txt 2> /tmp/gwenview-STDERR.txt & tail -f /tmp/gwenview-STD*

I noticed I was getting complaints from catfs about too many open files. The default was 1024 for the user, so I increased it to 10240 and rebooted.

I'm not getting the too many open files errors anymore, however, the other issues continue. I was checking how many open files there were for which processes, and here's the result. I am not entirely sure this is the right way to count "open files", however, I think these stats are interesting. Is this normal?

As root: lsof -n | grep myusername | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -n 4

133961 catfs
78938 chrome
49162 thunderbi
16426 gwenview