ulikunitz/xz

missing match limit, was "lzip"

nitzanb-jfrog opened this issue · 11 comments

is it possible to work with lzip fles which are also known of using LZMA compress algorithm ?

A simple test with lzip and gxz shows that it doesn't work out of the box. I have had only a few minutes to look into it, but since the documentation at http://www.nongnu.org/lzip/safety_of_the_lzip_format.html talks about a CRC-32 checksum, it appears that lzip files are not pure LZMA streams but may embed those streams in a frame to support CRC-32 checksums.

tansy commented

It is possible to make it work as file format is very simple and "static" and yes, lzip is using lzma. The problem is it is not using the same lzma_lzma1_decoder and lzma_dict structures, and probability table layout. So to make it work one had to adapt probability table and "surrounding" structures.

Hi, I reviewed the file format specification in the link I provided in my last comment.

LZIP appears to be a wrapper format for LZMA streams with fixed properties that uses the EOS marker. It should be possible to write an implementation using the lzma package. You could try it yourself. (I'm currently working on a faster reimplementation of the lzma package. LZMA is working again and I'm currently working on LZMA2, so I cannot promise I will look into lzip.)

tansy commented

That's sweet, thanks. Well, I don't know go. I come from C world and come here looking for different answer about xz.
Anyway, I tried to find these probability tables but couldn't, that's one thing, and two there is no Makefile nor INSTALL instruction, so unless one is a prophet there is no easy way to compile it.

If you want to learn Go, you can start here: https://go.dev/doc/

You can install the gxz binary with

$ go install github.com/ulikunitz/xz/cmd/gxz@latest

But this is only a test binary I have implemented for testing. To use the lzma package you would only need to insert

import (
     "github.com/ulikunitz/xz/lzma"
)

in your Go program. The go command will handle the rest. All of this is pretty standard and doesn't need explanation for Go programmers.

The probability tables are in the state structure (lzma/state.go).

tansy commented

If I'm not mistaken this is the table: fastpos_table.c. As I said I don't know go so how does that relate to that? I'm far from understanding it, especially in foreign language (I mean go). And it is foreign to me.

But, regardless, you have no Makefile nor any building scripts that would allow user to compile it. Well, I downloaded go, tried instruction from README and nothing happened. Without that how go you expect anyone to do any changes to the project, which is quite big? There are 79 .go files and not all of them being to the same "subproject". None it's going to investigate which file is what and create building script for such a big project git sake of little change they would like to introduce and test.

PS.
I also made a test and converted lz to lzma and lzma to lz by contriving header and copying lzma stream, without making a footer. And the result surprised me - xz decompressed lzma from lz amd lzip decompressed lzma stream from xz. If those probabilities are indeed different then decompressor would fail at some point, wouldn't it?

Attached converter script: lz-to-lzma.zip

fastpos_table is not a probability table. What it does is explained in fastpos.h. I do the calculation for the match distance in line 78 of distcodec.go, not using a fastpos table. I use a nlz32 function, which computes the number of leading zero bits of a 32-bit unsigned integer. The comment in fastpos.h references the bzr machine operation and what I'm doing is comparable to that approach. Actually I did the implementation simply from the specification, I only looked at the reference implementation that were part of the specification.

The implementation for nlz32 is in bitops.go. In my update I will use the LeadingZeros32 function from the math/bits package of the Go standard library. The package didn't exist when I wrote the code. The function call will be translated by the compiler directly into a machine instruction. This will be faster than any table lookup.

Go doesn't require any build scripts. All what is needed is a go.mod file, the rest is handled by the go command. As I wrote, this is natural for a Go developer. Rust doesn't have build scripts as well, it has the cargo command and the Cargo.toml file.

I have to correct myself. you cannot directly do lzip with my LZMA package, you would need to do something comparable like your lz to LZMA converter, because my lzma reader and writer requires the LZMA header. So I need to export a reader and writer for the raw LZMA stream without the header.