dlbeer/ufat

Ignoring read errors in alloc_cluster() can cause an "almost infinite" loop with broken SD cards

Opened this issue · 0 comments

In our application we are using your ufat library (via my RTOS). In one of the devices the SD card got damaged - it seems that it is able to correctly read (maybe also write) about 600-700 initial blocks, but for further blocks reads (possibly also writes) cause the SD card to fail with a timeout. This is the behaviour observed on the MCU, on a PC the card just fails to mount and trying to dump it allows you to read just the first ~700 blocks and about 350 kB of data, then it throws an I/O error each block.

The problem now seems to be that if the FAT file system mounts properly (and it does, because the initial part of card is fine), then trying to do some operation - in our case creating a folder - will result in a practically infinite loop due to ignored read errors in alloc_cluster(), exactly here:

ufat/ufat.c

Line 652 in e2a7466

if (!ufat_read_fat(uf, idx, &c) && c == UFAT_CLUSTER_FREE) {

This loop will try to iterate all existing clusters (in case of our 16 GB card - about 2 million...) to find a free one, ignoring read errors. Given that the mode of error of the SD card is "timeout", then each read attempt lasts about 100 ms, so it will take "forever" for the loop to end due to checking all the clusters.

Is ignoring read errors here intentional (e.g. to ignore single broken clusters, without completely failing the top-level operation) or just an omission? I'm not sure how to understand that part of code, so I'm starting with a question before potentially providing a fix.