Binsort is a tool to sort files of fixed-length binary records.
Features:
- Can compare only a slice of records when sorting (i.e. it is possible to sort records by comparing only bytes 8-16 of each record)
- Uses external sorting to be able to sort files larger than main memory
- Treats records as abstract arrays of bytes. If the data contains integers, sorting gives expected results if they are encoded with their most significant byte first (i.e. in big endian order)
go get github.com/arnaud-lb/binsort
This should leave a binsort
binary in $GOPATH/bin/
binsort --size|-s record_size [OPTIONS] infile outfile
-s, --size=BYTES
Record size in bytes (mandatory).
-o, --offset=BYTES
Ignore the first bytes in record for sorting. Defaults to 0.
-l, --length=BYTES
Use only length bytes for sorting. Defaults to size-offset.
-b, --block-size=RECORDS
Sort this number of records at once. This determines the
maximum number of records kept in memory at any time. Defaults
to the greatest of 1MB or 4 records. More is faster.
-T, --temporary-directory=DIR
Use DIR as temporary directory instead of the system
default. binsort may need to create files as big as the
input file.
Sort a file composed of 32-bytes records, by comparing only 8 bytes, starting at byte 16 in each record:
binsort --size 32 --offset 16 --length 8 ./input ./output