/enwik8

An attempt to compress the enwik8 file

Primary LanguageC++MIT LicenseMIT

About

An attempt to compress the first 100 MB of Wikipedia which is called enwik8 using LZW(Lempel–Ziv–Welch) and BZip2-Like algorithms with variable length encoding.

Results

  • LZW:
    • Compression ratio: 2.905
    • Compressed file size: 32 MB
  • BZip2-Like:
    • Compression ratio: 3.855
    • Compressed file size: 24 MB

How to run

  • Compression
    1. Open a terminal on the directory containing the code
    2. Generate the binary file using command: g++ -o encoder.exe encoder.cpp
    3. Run the binary file: ./encoder.exe
  • Decompression
    1. Open a terminal on the directory containing the code
    2. Generate the binary file using command: g++ -o decoder.exe decoder.cpp
    3. Run the binary file: ./decoder.exe

To Do

  • A Decoder for the BZip2-Like algorithm