/zopfli

Extending Zopfli functionality with experimental switches, ZIP support, multi-threading, best statistics directory/file based database to reuse best block state for compression resuming or recreating identical deflate stream in seconds, manual block split points and more to improve functionality, compression and speed.

Primary LanguageC++Apache License 2.0Apache-2.0

Zopfli KrzYmod is a Zopfli fork that heavily modifies original Zopfli
project for power-user control and more features.

Features:
- Multi-threading application,
- ZIP container support,
- Separate GZIP with filename stored support,
- Stores file modification time inside archive,
- Multiple files compression with ZIP container,
- Multiple recompressions support after splitting last,
- Best statistics directory/file based database to resume iterating,
- Additional switches to finetune compression and block splitting,
- Ability to use dumb block size splitting,
- Ability to use predefined split points,

Without passing its special commands the program should run as usual.

Description of new commands list in Zopfli KrzYmod:

1. --zip

   Tells Zopfli to use ZIP container instead of the default GZip. This can be useful
   on Microsoft based operating systems that support ZIP better. Note that the
   resulting file will be bigger than GZip.

2. --gzipname

   This will tell Zopfli to include filename inside GZip archive, so then changing
   gz filename will not cause archived file to be renamed. Note that it will produce
   file bigger by (compressed file name length + 1) bytes.

3. --dir

   Tells Zopfli to read directory as input instead of file. Scan it recursively and
   compress all files found. This option works only in conjuction with --zip switch,
   as only Zip from containers supported by Zopfli can hold multiple files.
   The uniqueness of this option is that ZIP file is updated on every successfully
   compressed file. The downside to this is that file IS NOT LOCKED and deleting or
   modyfing it before closing Zopfli may cause unexpected behaviour from corrupted
   ZIP file to Zopfli Crash.
   Note that there are limitations in this mode:
    - empty directories are not added,
    - empty files are not added,
    - directory structure is maintained by files being inside, there are no real
      records of directories preset in ZIP archive (this reduces ZIP size a bit
      but deleting all files inside a directory may cause directory in ZIP file
      to disappear),
    - character encoding may be incorrect if using special letters.
   Usually the workaround for above is to use some 1 byte files in directories that
   are supposed to be empty like empty.dir and to use english alphabet only.
   These issues may or may not be fixed in future releases (depends if there are any
   requests for them to be fixed).


4. --mb#

   This command tells zopfli to how many blocks at maximum split
   stream to. It's limited to 15 in the original Zopfli version
   and considered by program authors to be the best choice.
   However, it was found out that increasing number of blocks that
   stream gets split to produces smaller files.
   In case of too many blocks, stream may be slower to decompress.
   Setting it to 0 will use as many blocks as Zopfli auto block splitting
   model finds optimal.


5. --mui#

   Maximum unsuccessful iterations after last best. This switch is used to
   limit Zopfli work if it can't find another bit reductions after trying
   # iterations. In order for this option to be the most useful --i# parameter
   should be set very high, for example: --i999999 --mui999 will cause Zopfli
   to give up after 999 iterations that proved to be of no use for compressed
   block size bit reduction.
   This parameter is set to 1 when CTRL+C is pressed. Zopfli will still try
   to gain some savings on each block but only if the reduction comes with
   every iteration done, if there is at least one result that is not
   the best, Zopfli will end working on that block.

6. --lazy

   Use lazy matching in LZ77 Greedy. This option has an impact on block
   splitting model and iteration progression. In original version it
   can only be changed before compiling the software and it's considered
   to be another try&error switch.

7. --ohh

   Optimize Huffman Header by Frédéric Kayser. This options changes how
   Huffman trees are encoded in dynamic blocks.
   It records 8 as 4+4 not as 6+single+single and 7 as 4+3 not as 6+single
   as in default Zopfli Huffman tree encoding.
   Enabling this mode will usually improve compression by a few bits per blocks,
   rarery the result will be worse. Sometimes also impacts block splitting model.

8. --brotli

   Use Optmized Huffman for Rle algotihm from Brotli. This impacts block split points
   and compression. May or may not result in smaller files.

9. --rc

   Use descending sorting for lit/len and dist symbols' counts as per GCC 5.3 qsort
   behavior. Impacts compression and block splitting. Default is GCC 4.8 behavior.

10. --all

   Run 16 cominations of --lazy, --ohh, --brotli and --rc per block and pick
   smallest result.

11. --n#

   Use dumb block splitting by telling zopfli to how many blocks split stream to.

12. --b#

   Same as above but instead use size in bytes as delimeter. This and above option
   are most likely useless unless it's required for stream to use certain amount
   of blocks or certain uncompressed block size.

13. --cbs#

   Manually pass block start positions. The list must be passed in hexadecimal format,
   separated with commas. Note that first typed block split is REALLY omitted so can
   be anything or even empty, however, a comma should be passed then.
   Example: ,513,5555,fe89,14532.

14. --cbsfile#

   Same as above but instead of typing manually block start positions a file
   can be prepared holding that data. The contents MUST be the same as it would
   be passed with command line parameter, this means no new lines or any other
   characters than [0-f],[,] should be used. Otherwise undefined behaviour
   may occur.

15. --cbd#

   Save block start positons to a file after compression.

16. --aas

   Enables additional auto splitting mode when custom block start positions are passed
   so Zopfli can still decide between boundaries if the data should be split
   further while preserving start positions that were passed by other command
   line switches. Each data between manually passed block start positions is
   analysed separately as if it was separate stream input.

17. --bsr#

   Block splitting recursion. Changing this options will alter block splitting
   model. The default is 9 and this is another try&error option. Setting this
   value too high will cause Zopfli to spend more time splitting stream to
   blocks, as well as, may cause Zopfli to use fewer blocks without properly
   checking if the decissions are optimal enough.

18. --mls#

   This command alters the GetLengthScore function. Changing this parameter
   alters block splitting decissions as well as initial compression run before
   block iterating starts for which stats from this initial run are taken
   into account resulting in different iteration progression. The default value
   is 1024. When best stats are loaded from file (--statsdb) the initial run
   in dynamic block iterating is skipped so it has no impact on that block.

19. --nosplitlast

   Disables splitting last after compression. This mode is useful when custom split
   points need to stay the same. For example to huffmix results later.

20. --pass#

   Multiple splitting last & recompression attempts to check if resulting deflate stream
   gets smaller. It will run # times or until no size was reduced in the last recompression
   attempt. This mode doesn't work when --nosplitlast is used, or when the compressed input
   is only 1 block long.

21. --si#

   Stats to laststats in weight calculations. Zopfli by default takes 100% of current
   generated statistics with 50% of the ones from previous run before randomizing them.
   With this command You specify the percentage amount of current generated statistics.
   Maximum allowed number is 149, making current stats use 149% and last stats 1%.
   This has an impact on iterating proggress.

22. --cwmc

   Use Complementary-Multiply-With-Carry random number generator instead of the default
   Multiply-With-Carry. Both algorithms are made by G. Marsaglia.
   This provides different iteration progress.
   To read more: https://en.wikipedia.org/wiki/Multiply-with-carry

23. --rm#
   Random number generator modulo. By default Zopfli uses 3. Using different numbers
   (like: 5) can find best stats sooner or better on certain data.

24. --rw# and --rz#

   Initial random W and Z for iterations. These parameters are 1 for W and 2 for Z
   by default and they are locked in original Zopfli. They are used to init
   Multiply-With-Carry random generator before iterations per given block start.
   The random values are 32 bit unsigned integers that are generated when
   zopfli iteration is above 5 and the produced size is the same as the one before it.
   So changing these values will change random numbers being generated and make
   Zopfli find best result on lower or higher iteration than on default values.
   They are useful when re-running Zopfli multiple times with limited number of
   iterations while incrementing/randomizing --rw and/or --rz. The results
   per block can be compared or maybe even mixed together to produce yet
   smaller result with certain tools. Note that during my tests I got further
   1 bit reduction when changing these parameters on --mui999 when previously
   the file was created doing as many as 99,999 unsucessful iteration.

25. --statsdb

   Use Best Statistics / block directory/file based database. Files are stored
   in ZopfliDB directory that has this structure:
   - ZopfliDB main directory,
   - first CRC32 byte hexadecima representation [00-FF] subdirectory,
   - second CRC32 byte hexadecima representation [00-FF] subdirectory,
   - third CRC32 byte hexadecima representation [00-FF] subdirectory,
   - fourth CRC32 byte hexadecima representation [00-FF] subdirectory,
   - filename of [MODE USED 0-F]-[BLOCK SIZE IN BYTES].dat
   Files store some verification information to be portable between x86 and x64
   versions, last iteration processed and best stats data. It's possible to resume
   block iterations from the next iteration block was previously stopped at. It's also
   possible to recreate most condensed deflate stream within seconds if previous runs
   used more iterations that the number zopfli is run with later and block CRC32,
   size and mode used matches with the one stored using file/directory structure.
   This feature replaces restore points as files are smaller and it's possible to
   speed up compression on various streams if above criteria is met.

26. --t#

   Use # threads for compression. Limited by number of blocks in stream as each
   block is sent to a separate thread for compression. Most useful if there are many
   blocks to compress. Will use more memory as each thread needs separate LZ77 block
   store + cache, and main thread needs additional LZ77 temporary stores for
   out-of-order blocks returned by threads that are then processed and merged when they
   follow in-order block returned by given thread.

27. --idle

   Run Zopfli in lowest priority to not slow down other processes.

28. --v#

   This option is formerly known as just -v and it extends control over Zopfli
   verbosity. The default is 2 and it can be set from 0 (quiet) to 5 (most verbose):
   * 0 - quiet mode, don't display anthing except some errors,
   * 1 - program title, percentage progress and file added when --zip & --dir are used,
   * 2 - display block progression, bytes left to compress next to percentage progress,
         as well as, summary after every file being successfully compressed,
   * 3 - display fixed/dynamic block comparison + per block summary,
   * 4 - additionally display block split points, best iterations using separate lines
         and treesize (same functionality as -v in original release),
   * 5 - additionally display current iteration being processed using same line until
         bit reduction occurs, and Restore Points information.
   * 6 - additionally display debug mode of block splitting decissions.


Additionally to mentioned above options KrzYmod Zopfli version also fix few issues found
in original release, for example incorrect Deflate stream size being raported.
Also supports Unix (gzip) and MS-DOS (zip) timestamps inside archives.

Bitcoin: 1KrzY1CwE532e6YjN4aCzgV19gtAzQMatJ
Paypal:  https://www.paypal.me/MrKrzYch00
^ in case You want to reward my work.

====================================
           by Mr_KrzYch00
====================================

Zopfli Compression Algorithm was created by Lode Vandevenne and Jyrki
Alakuijala, based on an algorithm by Jyrki Alakuijala. Further modifications
as described in the document above were done by Mr_KrzYch00.

For more information on Zopfli please refer to:
- README.zopfli
- README.zopflipng
- original zopfli project at https://github.com/google/zopfli
- frkay's fork: https://github.com/frkay/zopfli
- all other forks this fork may include changes from:
  https://github.com/frkay/zopfli/network