Compression algorithm for newly created sapcar archives

Question

Compression algorithm for newly created sapcar archives

Closed this issue 8 years ago · 13 comments

Hi

This is not really an issue but I didn't find your contact e-mail address. I'm currently thinking about reimplementing the decompression algorithms in another language (probably haskell) so they are buffer overflow safe. So I'm wondering with which one should I start? All the archives I've created myself seem to be using LZC only. So my question to you is: If I only intend to decompress SAPCAR2.01 archives, do you think LZC will suffice?

Have you made any attempts at understanding the (pretty... hard to "parse") SAP C files yet? ;)

Any other way to contact you about this?

Thanks

Hans-Christian

Answer 1 · 2016-06-08T13:46:51.000Z

Looking at it more closely, it seems that the 5th byte in the compression header (https://www.coresecurity.com/system/files/publications/2016/05/SAPCarTalk-Slides.pdf, page 6) is reversed: 0x12 seems to meen LZH (your slides say LZC) and vice versa. Am I missing something...?

Answer 2 · 2016-06-08T17:14:45.000Z

Hi Hans-Christian!

This is not really an issue but I didn't find your contact e-mail address.

You will find my contact in the Readme: https://github.com/CoreSecurity/pysap#contact

I'm currently thinking about reimplementing the decompression algorithms in another language (probably haskell) so they are buffer overflow safe. So I'm wondering with which one should I start? All the archives I've created myself seem to be using LZC only. So my question to you is: If I only intend to decompress SAPCAR2.01 archives, do you think LZC will suffice?

My impression is the same, in all the cases I saw, the algorithm in use for CAR 2.01 archives was LZC. The same happens for SAP Diag/RFC traffic. I think that starting from this algorithm will cover most of the cases.

Have you made any attempts at understanding the (pretty... hard to "parse") SAP C files yet? ;)

My initial efforts were mostly focused on porting the compression algorithms to plain C instead of C++, mostly to ease the potential integration of the Wireshark plug-in with Wireshark code base. But in my mind there was the idea of porting it to Python, so we can have a pure-Python pysap lilbrary that doesn't have any native code. I made some progress on the first, but didn't start working on the latter. Any help is obviously welcomed :)

Looking at it more closely, it seems that the 5th byte in the compression header (https://www.coresecurity.com/system/files/publications/2016/05/SAPCarTalk-Slides.pdf, page 6) is reversed: 0x12 seems to meen LZH (your slides say LZC) and vice versa. Am I missing something...?

Nice catch! You're right, the correct values are in the code (https://github.com/CoreSecurity/pysap/blob/master/pysap/SAPDiag.py#L502 and https://github.com/CoreSecurity/pysap/blob/master/pysap/SAPCAR.py#L91) but not in the slides. I'll try to issue an update of the publication to fix this. Thanks for pointing out!

Answer 3 · 2016-06-08T19:43:35.000Z

My impression is the same, in all the cases I saw, the algorithm in use for CAR 2.01 archives was LZC. The same happens for SAP Diag/RFC traffic. I think that starting from this algorithm will cover most of the cases.

Sorry, I have the names changed not only on the slides but in my head. When you create a new CAR 2.01 file the algorithm that SAPCAR uses the most is LZH not LZC. The same for Diag/RFC traffic.

Answer 4 · 2016-06-09T08:04:55.000Z

Great, thank you! If LZH is the most commonly used algorithm that's even better, because I've already partly reimplemented that. I'll get back to you on this...

Answer 5 · 2016-06-09T16:08:30.000Z

Great, looking forward to see your advances on this!

BTW, I've updated the presentation fixing the values, the new URL is https://www.coresecurity.com/system/files/publications/2016/06/SAPCarTalk-Slides.pdf. Thanks!

Answer 6 · 2016-06-09T20:30:53.000Z

What is the biggest file you have decompressed with your code?

I've had some success at decompressing small sapcar files now. :) Now I've created an archive with a big file (20MB uncompressed, ~7MB compressed). With your code I'm getting the following exception:

Traceback (most recent call last):
File "test.py", line 6, in
aa=testcar.files["bigfile"].open()
File "/home/hc/builds/pysap/pysap/SAPCAR.py", line 374, in open
(_, out_length, out_buffer) = decompress(str(compressed)[4:], exp_length)

My own decoder shows:

CarEntry
{ cfFileType = CarFile
, cfPermissions = 33261
, cfLength = 28374520
, cfTimestamp = 1465503483
, cfFileName = "bigfile\NUL"
, cfCompLen = 22071
, cfCompHdr =
CompHdr
{ chLen = 65536 , chAlg = CompLzc , chMagic = 8093 , chSpe = 2 }
, cfCrc32 = 1073692996
, cfContents = "..."
}

So... The compressed length is a lot smaller than it should be... Also, the "ED" is not present in the where it should be... I'm guessing big files are split into chunks that need to be decompressed individually until the result is as many bytes as is specified in the "uncompressed length" header field.

Answer 7 · 2016-06-09T21:44:19.000Z

Looks like large files are compressed in chunks; each chunk is denoted with a "DA" in the header, and no CRC is appended. The last chunk is denoted with an "ED" in the header.

Answer 8 · 2016-06-10T18:34:31.000Z

So... The compressed length is a lot smaller than it should be... Also, the "ED" is not present in the where it should be... I'm guessing big files are split into chunks that need to be decompressed individually until the result is as many bytes as is specified in the "uncompressed length" header field.

I had found some similar things with my tests, but not consistent across 2.00 and 2.01.

Looks like large files are compressed in chunks; each chunk is denoted with a "DA" in the header, and no CRC is appended. The last chunk is denoted with an "ED" in the header.

How are you handling the chunks lengths? I recall that in most of my cases lengths didn't seem to match. I'll check this again as soon as I have some time. Very good feedback BTW :)

Answer 9 · 2016-06-10T22:27:00.000Z

The lenghts seem to be fine (for 2.01 at least), I'm looking at the "compressed length" header to move from one block to the next. The "decompressed length" always seems to be 65536, except for the last block.

Answer 10 · 2016-06-11T18:29:33.000Z

I'm getting discrepancies for most files, but not all: the reported uncompressed length is 2^16, but the decompressed chunk is a few bytes less in size. The reported length must be correct, however, as the decompressed result lacks a few hundred bytes per block.

Answer 11 · 2016-06-11T19:57:12.000Z

Two lzh compressed chunks can be contained within one sapcar block, it seems. To know how many bytes were read for decompressing the first block, looks like you have to tap into the lzh decompressing routine.

Answer 12 · 2016-06-13T14:29:17.000Z

That's really good feedback! I'll check and see if I can arrive to the same using pysap, but as you say probably I've to change the decompression API.

Answer 13 · 2016-06-19T14:30:35.000Z

Thanks again for the help! :)