eclipse-threadx/filex

Rebooting with open files can cause "frozen files", returning FX_FILE_CORRUPT everywhere

Closed this issue · 6 comments

Describe the bug
I have one file that one way or another is corrupt and there's no way I can get rid of it. I'm not sure how I got into this situation but I believe it relates to rebooting the system with open files. I use exFAT on an SD-card with fault-tolerance enabled.

When I try to open the file, I get FX_FILE_CORRUPT, if I try to delete it I get the same error. Also if I attempt to use fx_media_check to fix the file system, I get the very same returned error. This is where it returns this value
https://github.com/azure-rtos/filex/blob/9e3ad51f2b64ed05df50a211909eef536c0d06ec/common/src/fx_media_check.c#L462

When I try to list all the files in the directory with fx_directory_next_entry_find() I get the files in the folder up until the corrupt file is found, upon which it returns FX_FILE_CORRUPT.

The following is the code to format and mount the file system

    if ((mp->flags & FS_MOUNT_FLAG_NO_FORMAT) == 0)
    {
        LOG_INF("formatting %s, block_size=%u, block_count=%u (%llu MB)",
                mp->mnt_point,
                block_size,
                block_count,
                (uint64_t)block_count*block_size/1000000);
        require_ok(fx_media_exFAT_format(&ctx.media,
                              filex_port_io,
                              &ctx,
                              ctx.buffer,
                              sizeof(ctx.buffer),
                              "DISK",
                              1,
                              0,
                              block_count,
                              block_size,
                              64,
                              0,
                              8192), bail);
    }

    require_ok(fx_media_open(&ctx.media,
                             "DISK",
                             filex_port_io,
                             &ctx,
                             ctx.buffer,
                             sizeof(ctx.buffer)), bail);
    require_ok(fx_fault_tolerant_enable(&ctx.media,
                                        ctx.ft_buffer,
                                        sizeof(ctx.ft_buffer)), bail);

Using version v6.2.0 of FileX on an ST MCU.

Expected behavior
A file that is not closed on system restart file should either be partially written or not exist. It should definitely be possible to delete the file.

Impact
Showstopper since that file is now frozen

Some more information on this after digging around today. It seems like the scenario is the following. It happens every time

  1. Open a file
  2. Write 1kb to it
  3. Truncate it and don't close it
  4. Crash (restart system without cleaning up)
  5. Open, it will fail and the file is now frozen

This doesn't happen on FAT partitions

Hi @emillindq, thank you for reporting this bug. I am able to reproduce it. I am working on a fix and will update you soon.

Hi @emillindq, this is a bug in fx_file_truncate. We will fix it in future release. For now, you may use fx_file_truncate_release instead to work around this bug.

@xiuwencai thank you for this update!! I want you to know it shows MS is providing strong support for this file system lib, honestly healing the wounds in the FOSS community caused by past business decisions MS has made (even if it's vendor specific). Satya Nadella has my respects in this regard.

Anyways, due to this bug we have switched to the FAT32 in this lib, do you think it's safe to go back to exFAT using the proposed change or is there any doubts according to your expert knowledge in the root of this bug?

Is there an update on this issue? I'm also using exFAT and notice file corruption under very specific circumstances that I can't exactly replicate, but I wouldn't be surprised if its truncation related.

Is the recommendation to not use exFAT until this fix is in main?

Closing as exFAT was removed in the transfer to Eclipse