Cleanup intermediate & carved files
martonilles opened this issue · 0 comments
martonilles commented
As part of the extraction process we carve out part of a file or sometimes the whole file to be extracted further. Once the extraction is done, we leave the carved files in place, though this could significantly increase the output size and also generate some garbage, that the user does not need.
Carved out files
- all successfully extracted chunks could be deleted
- we need to add a flag/option to keep extracted chunks
- keep all unknown chunk
- keep all chunks with null-extractor
- keep all chunks where extraction failed
- to decide if extraction was successful the extractor would need to return the success state
- for command extraction we should be able to monitor exit codes and list all success exit codes
- maybe in the future we can also compare meta-data with extracted files to determine success
- maybe later we can also delete unknown chunks if they are just padding (we would need padding detection for that)
Intermediate files
Intermediate files are the one which are non-carved out content but does not have additional information, eg:
- tgz file which is first compressed into a tar file and tar files are extracted. In that case the tar file is an intermediate file.
For the timebeing we can keep these intermediate files, maybe later we can add a logic to cleanup these as part of the extraction or in post-processing.