jd-boyd/python-lzo

Incompatibility with LZOP

Opened this issue · 1 comments

I am trying to decompress data that was compressed using the LZOP utility but it seems that python-lzo does not understand the header for the file. In my testing, it seems that the incompatibility is two way.

To test this, I took a sample text file and compressed it using LZOP:
lzop -c -5 -o file.lzo file.txt

I then tried to decompress this using python-lzo:
import lzo
with open('file.lzo', 'rb') as file:
data = file.read()
a = lzo.decompress(data)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
lzo.error: Header error - invalid compressed data

I then tried the reverse:
import lzo
with open('file.txt', 'r') as file:
data = file.read()
b = lzo.compress(data, 5, 1) # compression level 5, include header
newFile = open('file.lzo', 'wb')
newFile.write(b)
newFile.close()

I then try to decompress this file with LZOP:
lzop -d file.lzo

lzop: file.lzo: not a lzop file

When I look at the two compressed files using hexdump, I see that the file compressed python-lzo has a very limited header (7 bytes) that does not match anything in the header of the file compressed by LZOP which matches the header definition found here - https://gist.github.com/jledet/1333896.

python-lzo header:
00000000 f0 01 f1 66 f2 00 02 |...f...|

LZOP header:
00000000 89 4c 5a 4f 00 0d 0a 1a 0a 10 40 20 a0 09 40 01 |.LZO......@ ..@.|
00000010 05 03 00 00 01 00 00 81 a4 63 e4 37 fd 00 00 00 |.........c.7....|
00000020 00 08 66 69 6c 65 2e 74 78 74 73 21 08 3a 00 04 |..file.txts!.:..|
00000030 00 00 00 01 04 b7 b8 02 e3 a6 00 02 |............|

When I look in the source for python-lzo, I see that it has the code to process the type of header used by LZOP but I can't make it read or write files that have that header.

For anyone looking at this and hoping for a solution, I was able to achieve decompressing a file compressed with LZOP using a combination of python-lzo and some code from python3_lzo_indexer - https://github.com/Orhideous/python3_lzo_indexer.

You have to use a modified form of the code in get_lzo_blocks() to extract a block:

Use read(compressed_blocksize) to read the compressed data block. This block can then be passed to lzo.decompress(block, 0, decompressed_blocksize) to obtain the decompressed block that can be written out or used in whatever way you need.