/squeezer

Painless compressed files (gz, bzip2, xz) support for Clojure

Primary LanguageClojure

squeezer

https://api.travis-ci.org/lopusz/squeezer.png?branch=master

Seamless support for compressed files.

http://clojars.org/squeezer/latest-version.svg

Usage

Creating a compressed file is dead simple:

(require '[squeezer.core :as sc])
(sc/spit-compr "test.txt.gz" "test 1\ntest 2\ntest 3")

You can examine the file in your favourite shell to see that it works.

> file test.txt.gz
test.txt.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)

You can equally easy slurp the file back:

(require '[squeezer.core :as sc])

(sc/slurp-compr "test.txt.gz")

;"test 1\ntest 2\ntest 3"

The desired compression algorithm is specified on the basis of the extension. .gz for gzip .bz2 for bzip2 and .xz for xz are supported. You can override this behaviour by forcing compression using keyword :compr.

; Do not do that!!

(sc/spit-compr "test.txt.gz" "test 1\ntest 2\ntest 3" :compr "bzip2")

If you do not believe that this works, ask your favourite shell:

> file test.txt.gz
test.txt.gz: bzip2 compressed data, block size = 900k

Now reading is a pain:

(sc/slurp-compr "test.txt.gz")
; ZipException Not in GZIP format

unless you know what the trick is

(sc/slurp-compr "test.txt.gz" :compr "bzip2")
; "test 1\ntest 2\ntest 3"

To make sure the compression algorithm is adjusted correctly, you can use the mime type of the file (detected, e.g., by the library pantomime).

FAQ

How do I lazily read compressed csv file, record by record?

It is easy.

Add clojure-csv and squeezer to your project.clj.

To read first five lines of your big_data.csv.gz, type in your REPL:

(require '[squeezer.core :as sc] '[clojure-csv :as csv])

(->> "big_data.csv.gz"
     sc/reader-compr
     csv/parse-csv
     (take 5))

License

Distributed under the Eclipse Public License, the same as Clojure.