tobozo/ESP32-targz

tarGzExpander corrupts expanded data when tar contains more than one file

martirius opened this issue · 9 comments

Hi @tobozo ,
I'm trying to untar a tar.gz file that includes 4 files.
After decompression i see a message on Serial that says : "[ERROR]: tar expanding done!" but followed by 100% and success.
So i tried to read from SPIFFS the files expanded but they are corrupted; this is what i found printing one of files (only a portion):
document.getElementById("cdl_status").innerHTML = response["cdl_status"] == 1 ? "CDL PAIRED" : "CDL UNPAIRED"; initialized = response["uuid"] != undefined
root.pem000664 001750 001750 00000002260 13714534531 012510 0ustar00pedupedu000000 000000 tyle="margin-left: 50px;"></span> <span style="margin-left: 10px;">WEB</span> <span id="iot_led" style="margin-left: 50px;"></span> <span
How did this part come into the file?

What i noticed also is that some part of files are mixed (i found parts of a file in another and viceversa)

hi @martirius

can you share your code ?

sounds like one of the two formats (gz or tar) is using an unsupported feature, it would also be interesting to know how you proceeded to produce the .tar.gz file, and the size of its content.

[edit]

Temporary workaround

Use gzExpander and tarExpander separately

  // direct .tar.gz to file expanding is broken when tar contains multiple files
  if( SPIFFS.exists("/data.tar.gz") ) {
    gzExpander(SPIFFS, "/data.tar.gz", SPIFFS, "/tmp/data.tar");
    tarExpander(SPIFFS, "/tmp/data.tar", SPIFFS, "/your_webroot");
    SPIFFS.remove("/tmp/data.tar"); // remove temporary tar
    SPIFFS.remove("/data.tar.gz"); // only expand once
  }

This is the tar.gz, they are html files, a css and a public pem

There is a webserver on ESP32 and pages are loaded from SPIFFS

In setup() i do this:

if(tarGzExpander(SPIFFS, "/data.tar.gz", SPIFFS, "/") != 0){ Serial.println("Error!"); }else { Serial.println("All good!"); tarGzListDir( SPIFFS, "/"); }

Then in server request handler i do this :

File home = SPIFFS.open("/home.htm", FILE_READ); request->send(200, "text/html", home.readString()); home.close();

Loading files from SPIFFS worked so far before trying to having all files in a tar.gz

I don't know how it was produced the tar.gz file since i didn't do it

I've just clicked the link, downloaded and opened the gz file in a text editor, and I can clearly read its contents so if it's gzipped, it doesn't seem to use a strong compression level, or it has been uncompressed on the fly.

So I've taken the file and copied it manually on the SPIFFS partition, then ran the tarGzExpander sketch and got this comforting result:

10:49:15.169 -> [D][esp32-hal-psram.c:47] psramInit(): PSRAM enabled
10:49:15.467 -> targz expander start!
10:49:15.500 -> gzip file detected ! gz size: 4340 bytes, expanded size:22528 bytes
10:49:15.732 -> creating /tmp folder
10:49:15.865 -> setup begin
10:49:15.865 -> setup end
10:49:15.865 -> Progress:
10:49:15.865 -> [0%===========================================
10:49:15.865 ->       filename: root.pem
10:49:15.865 ->       filemode: 0664 (436)
10:49:15.865 ->            uid: 01750 (1000)
10:49:15.865 ->            gid: 01750 (1000)
10:49:15.865 ->       filesize: 02260 (1200)
10:49:15.865 ->          mtime: 013714534531 (1597159769)
10:49:15.865 ->       checksum: 012510 (5448)
10:49:15.898 ->           type: 0
10:49:15.898 ->    link_target: 
10:49:15.898 -> 
10:49:15.898 ->      ustar ind: ustar
10:49:15.898 ->      ustar ver: 00
10:49:15.898 ->      user name: pedu
10:49:15.898 ->     group name: pedu
10:49:15.898 -> device (major): 0
10:49:15.898 -> device (minor): 0
10:49:15.898 -> 
10:49:15.898 ->   data blocks = 3
10:49:15.898 ->   last block portion = 176
10:49:15.898 -> ===========================================
10:49:15.898 -> 
10:49:16.396 -> [D][ESP32-targz.cpp:381] unTarHeaderCallBack(): Creating /tmp folder
10:49:16.496 -> Creating /tmp/root.pem
10:49:16.993 -> 
10:49:17.027 -> ===========================================
10:49:17.027 ->       filename: login.htm
10:49:17.027 ->       filemode: 0664 (436)
10:49:17.027 ->            uid: 01750 (1000)
10:49:17.027 ->            gid: 01750 (1000)
10:49:17.027 ->       filesize: 010143 (4195)
10:49:17.027 ->          mtime: 013707063330 (1595696856)
10:49:17.027 ->       checksum: 012637 (5535)
10:49:17.027 ->           type: 0
10:49:17.027 ->    link_target: 
10:49:17.027 -> 
10:49:17.027 ->      ustar ind: ustar
10:49:17.027 ->      ustar ver: 00
10:49:17.027 ->      user name: pedu
10:49:17.027 ->     group name: pedu
10:49:17.027 -> device (major): 0
10:49:17.027 -> device (minor): 0
10:49:17.027 -> 
10:49:17.027 ->   data blocks = 9
10:49:17.027 ->   last block portion = 99
10:49:17.027 -> ===========================================
10:49:17.027 -> 
10:49:17.524 -> [D][ESP32-targz.cpp:381] unTarHeaderCallBack(): Creating /tmp folder
10:49:17.624 -> Creating /tmp/login.htm
10:49:18.121 -> Z
10:49:18.187 -> ===========================================
10:49:18.187 ->       filename: style.css
10:49:18.187 ->       filemode: 0664 (436)
10:49:18.187 ->            uid: 01750 (1000)
10:49:18.187 ->            gid: 01750 (1000)
10:49:18.187 ->       filesize: 02763 (1523)
10:49:18.187 ->          mtime: 013707063330 (1595696856)
10:49:18.187 ->       checksum: 012700 (5568)
10:49:18.187 ->           type: 0
10:49:18.187 ->    link_target: 
10:49:18.187 -> 
10:49:18.187 ->      ustar ind: ustar
10:49:18.187 ->      ustar ver: 00
10:49:18.187 ->      user name: pedu
10:49:18.187 ->     group name: pedu
10:49:18.187 -> device (major): 0
10:49:18.187 -> device (minor): 0
10:49:18.187 -> 
10:49:18.187 ->   data blocks = 3
10:49:18.187 ->   last block portion = 499
10:49:18.187 -> ===========================================
10:49:18.187 -> 
10:49:18.684 -> [D][ESP32-targz.cpp:381] unTarHeaderCallBack(): Creating /tmp folder
10:49:18.783 -> Creating /tmp/style.css
10:49:19.281 -> Z
10:49:19.314 -> ===========================================
10:49:19.314 ->       filename: home.htm
10:49:19.314 ->       filemode: 0664 (436)
10:49:19.314 ->            uid: 01750 (1000)
10:49:19.314 ->            gid: 01750 (1000)
10:49:19.314 ->       filesize: 026152 (11370)
10:49:19.314 ->          mtime: 013712034122 (1596471378)
10:49:19.314 ->       checksum: 012457 (5423)
10:49:19.347 ->           type: 0
10:49:19.347 ->    link_target: 
10:49:19.347 -> 
10:49:19.347 ->      ustar ind: ustar
10:49:19.347 ->      ustar ver: 00
10:49:19.347 ->      user name: pedu
10:49:19.347 ->     group name: pedu
10:49:19.347 -> device (major): 0
10:49:19.347 -> device (minor): 0
10:49:19.347 -> 
10:49:19.347 ->   data blocks = 23
10:49:19.347 ->   last block portion = 106
10:49:19.347 -> ===========================================
10:49:19.347 -> 
10:49:19.811 -> [D][ESP32-targz.cpp:381] unTarHeaderCallBack(): Creating /tmp folder
10:49:19.944 -> Creating /tmp/home.htm
10:49:20.441 -> ZZZ
10:49:20.507 -> [ERROR]: tar expanding done!
10:49:20.507 -> [D][ESP32-targz.cpp:153] gzProcessTarBuffer(): Failed reading 512 bytes in gzip block #8
10:49:20.507 -> 100%]
10:49:20.507 -> success!
10:49:20.773 -> /tmp/root.pem                        1200 bytes
10:49:20.773 -> /tmp/login.htm                       4195 bytes
10:49:20.773 -> /tmp/style.css                       1523 bytes
10:49:20.773 -> /tmp/home.htm                       11370 bytes

although the ending error message seems confusing, this totally checks out.

image

How did your .tar.gz file get onto the SPIFFS, was it downloaded directly from the ESP32 with a HTTP Client or was it uploaded using Arduino's ESP32 Sketch Data Uploader ?

Speculation : because the gzip is produced by a PHP script, the returned compression format may be different (could be depending on the client's "Accept-encoding" HTTP headers values).

So if dowloading from an ESP32 HTTP Client, the Accept-Encoding: gzip header may be required.

If the browser does this implicitely, maybe ESP32 HTTPClient doesn't ?

Thanks for investigating but the php doesn't generate every time a new data.tar.gz, it's always the same that is being downloaded.
The tar.gz at the moment is not downloaded from the ESP32 but loaded into SPIFFS with PlatformIO Upload file system image Task

What i noticed is that you decompressed the tar.gz in the /tmp folder; i didn't do that and decompressed it in root ("/"), maybe it's a problem decompressing in root?

Can you try to print on Serial the content of those files, so we can see if they are corrupted or not?

you decompressed the tar.gz in the /tmp folder

all archive managers use a temporary folder for unpacking, I guess this is to prevent unpacked content to overwrite the archive itself while it's being read.

I've quick-coded a hex viewer and I have indeed file corruption symptoms.

image

This sounds like a stream buffer problem, currently investigating but I'm not confident with finding a quick solution without help.

Meanwhile you can still use the tar and gz expanders separately as a workaround, this test snippet does not seem to produce corrupted contents :

  // direct .tar.gz to file expanding is broken when tar contains multiple files
  if( SPIFFS.exists("/data.tar.gz") ) {
    gzExpander(SPIFFS, "/data.tar.gz", SPIFFS, "/tmp/data.tar");
    tarExpander(SPIFFS, "/tmp/data.tar", SPIFFS, "/your_webroot");
    SPIFFS.remove("/tmp/data.tar"); // remove temporary tar
    SPIFFS.remove("/data.tar.gz"); // only expand once
  }

Thanks a lot, i will try the temporary proposed solution

If i can help to solve the issue, let me know

I still can't wrap my head around this problem, I'm thinking of removing tarGzExpander from the equation.

Of course I could just add a software limitation and emit a warning/error so I can call this bug a feature :-)
However I'm quite sure this has worked as expected at some time in the past and the fix would be trivial to find if only I had time to research for that.

So I'll rename and leave this issue open for the meantime.

hey @martirius just a heads up on the code currently available on the noram branch.

tarGzExpander has a fix for gz=>tar=>filesystem decompression without intermediary file. This version has been tested on an archive of 50000 files of various size/path-depth, and no error has been found so far.

Now both methods are available

  1. Use intermediary file (low memory requirements but slow + space requirements):
tarGzExpander(SPIFFS, "/www.tar.gz", SPIFFS, "/www", "/tmp/www.tar" )
  1. Use no intermediary file (high memory requirements but fast):
tarGzExpander(SPIFFS, "/www.tar.gz", SPIFFS, "/www", nullptr )

With SPIFFS some path depth checks are made (and reported if logging isn't disabled) during decompression without halting the overall process.

There's also a very memory-agressive mode where gz => tar => filesystem decompression can be achieved with only 516 bytes of ram but it's soooo slow 🦥

For the real deal (fast, silent) use this code:

// #define DEST_FS_USES_SD
#define DEST_FS_USES_SPIFFS

// ( .... later in your setup or loop  .... )

// attach FS callbacks to prevent the partition from exploding during decompression
setupFSCallbacks( targzTotalBytesFn, targzFreeBytesFn );

// attach empty callbacks to silent the output (zombie mode)
setProgressCallback( targzNullProgressCallback );
setLoggerCallback( targzNullLoggerCallback );

if( tarGzExpander(SPIFFS, "/www.tar.gz", SPIFFS, "/www", nullptr ) ) {
  Serial.println("Yay!");
} else {
  Serial.printf("tarGzExpander failed with return code #%d\n", tarGzGetError() );
}

please let me know if this worked for you so I can close this issue and produce a new release

happy end of year 🍾

fixed on master, I'll produce a new release soon, closing this as solved