/resplit

📜🪓 split large files into smaller ones using a regex

Primary LanguageScalaOtherNOASSERTION

resplit

Scala Steward badge

A command line utility for splitting files based on a regular expression

-- reimplementation of gnu csplit https://man7.org/linux/man-pages/man1/csplit.1.html

Built with ScalaNative, fs2 and cats!

Demo

Streaming "The Adventures of Sherlock Holmes" with wget and spliting by chapter regex into unique files

Resplit_Mov_AdobeExpress

Usage

> resplit --help
Usage: resplit [options] regexToMatch [regexToSub]

Splits a file based on a regex. split files will be prefixed by digits,
and named by the contents of the matched regular expression.

Outputs names of files created to stdout

  regexToMatch             A regular expression to split the file on
  regexToSub               A regular expression substitution expression to use to format the output filenames
  -n, --digits <value>     Number of digits to left-pad the split filenames with
  -d, --directory <value>  Directory to write the split files into
  -f, --file <value>       Read from the specified file instead of stdin
  --suppressMatched        Include the line that matched the regexMatch arg as the first line in the split files
  -s, --quiet              Quiet
  -z, --elide-empty-files  Remove empty output files
  --help                   prints this usage text

Installation

  • Download the latest release for your target platform
    • wget https://github.com/aesakamar/resplit/releases/download/v0.1.1/resplit-macos-latest
  • Grant executable permissions on the downloaded file
    • chmod +x resplit-macos-latest
      
  • Move the executable to a place accessible onyour $PATH
    • mv resplit-macos-latest ~/bin/resplit
      

Examples

Input:


cat1
cat2
cat3
dog1
cat4
cat5
cat6
dog2
cat7
cat8
cat9
> cat testfile | resplit '(dog)\d' '$1'

Output:


000_

cat1
cat2
cat3

001_dog

dog1
cat4
cat5
cat6

002_dog

dog2
cat7
cat8
cat9