/harx

HTTP Archive Extractor

Primary LanguageTypeScriptMIT LicenseMIT

Extract files from HTTP Archives (.har) with a simple command line interface.


Install

To utilize the harx command in your Terminal, you first must install it globally.

pnpm i -g harx
yarn global add harx

Usage

As long as $(yarn global dir) or pnpm's global dir is in your system's $PATH var, you can run harx directly:

USAGE

  $ harx <file> [options]


OPTIONS

  -o, --output  [path]     Output path to extract files (default is ./output) 
  -i, --include [pattern]  RegExp pattern: extract matching files, unless excluded
  -e, --exclude [pattern]  RegExp pattern: skips matching files, unless included

  -n, --no-query           Strip the URL query string from file paths
  -d, --dry-run            Do not persist any lasting changes. For testing.
  -v, --verbose            Be more talkative
  -h, --help               Displays this help page
  --version                Displays the current harx version


EXAMPLES

  # extracts everything to /path/to/out/net
  harx ./net.har --output /path/to/output

  # includes only .html files
  harx archive.har --exclude "*" --include "*.html"

  # excludes only .js files
  harx archive.har --include "*" --exclude "*.js" 

Options

-o, --output  [path]     Output path for extracted files (default is ./output) 
-i, --include [pattern]  RegExp pattern: extract matching files, unless excluded
-e, --exclude [pattern]  RegExp pattern: skips matching files, unless included

-n, --no-query           Strip the URL query string from file paths
-d, --dry-run            Don't persist any lasting changes. For testing.
-v, --verbose            Be more talkative

Note: if you get an error stating harx cannot be found or similar, try to run pnpx harx or npx harx instead.

Examples

Basic extraction

# extracts HTTP snapshot into "./output/nberlette.github.io" subfolder
# includes all images, fonts, styles, and HTML files that DevTools
# recorded in the Network log before you exported the HAR file.

harx ./nberlette.github.io.har -o ./output --no-query

Includes and Excludes

Excludes all files except .html and .css

harx ./archive.org.har --exclude "*" --include "*.html" --include "*.css"

Includes only .png images, nothing else

harx ./archive.har -o images -e "*" -i "*.png"

Dry Runs

If you'd like to simply examine what files would be extracted, without actually writing to the file system...

harx ./archive.har --output output --dry-run --verbose

References

License

MIT © 2022 Nicholas Berlette • Inspired by azu