Streaming interface for decoding tarballs on modern JavaScript runtimes (Cloudflare Workers, Deno, etc.).
- Extract tarballs of any size with no filesystem usage and low memory usage.
- Handles many of the tar extensions (ustar, pax, gnu) to enable long file names, attributes, etc.
- Works anywhere WebStreams are supported.
npm install streaming-tarball
Use the extract
function to get a readable stream of tar objects. Each object contains a header and a body (if it's a file).
You can get the full body of the file as text by calling obj.text()
.
import { extract } from 'streaming-tarball';
const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));
for await (const obj of extract(stream)) {
console.log(
'name:', obj.header.name,
'type:', obj.header.type,
'size:', obj.header.size,
);
console.log('text body:', await obj.text());
}
The file bodies are actually binary streams, so we could've rewritten the above example using the obj.body
stream like this:
import { extract } from 'streaming-tarball';
const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));
for await (const { header, body } of extract(stream)) {
console.log(
'name:', header.name,
'type:', header.type,
'size:', header.size,
);
if (body) {
const subStream = body.pipeThrough(new TextDecoderStream());
let str = '';
for await (const chunk of subStream) {
str += chunk;
}
console.log('text body:', str);
}
}
Because file bodies are sub-streams of the parent stream, you must consume them all in order for the parent stream to make progress.
There's a discard
helper function on the tar object to help you do that when you aren't otherwise using the body:
import { extract, TAR_OBJECT_TYPE_FILE } from 'streaming-tarball';
const response = await fetch('https://github.com/shaunpersad/streaming-tarball/archive/refs/heads/main.tar.gz');
const stream = response.body.pipeThrough(new DecompressionStream('gzip'));
for await (const obj of extract(stream)) {
if (obj.header.type === TAR_OBJECT_TYPE_FILE && obj.header.size < 100_000) {
console.log(
'file found:', obj.header.name,
'text body:', await obj.text(),
);
} else {
await obj.discard(); // consumes the unused body
}
}
There are many other tar object types, which you can determine by comparing obj.header.type
to the appropriate string value,
which are conveniently exported for you:
import {
TAR_OBJECT_TYPE_BLOCK_SPECIAL,
TAR_OBJECT_TYPE_CHAR_SPECIAL,
TAR_OBJECT_TYPE_CONTIGUOUS,
TAR_OBJECT_TYPE_DIRECTORY,
TAR_OBJECT_TYPE_FIFO,
TAR_OBJECT_TYPE_FILE,
TAR_OBJECT_TYPE_GNU_NEXT_LINK_NAME,
TAR_OBJECT_TYPE_GNU_NEXT_NAME,
TAR_OBJECT_TYPE_HARD_LINK,
TAR_OBJECT_TYPE_PAX_GLOBAL,
TAR_OBJECT_TYPE_PAX_NEXT,
TAR_OBJECT_TYPE_SYM_LINK,
} from 'streaming-tarball';
Note however that you will never see the _GNU_
or _PAX_
object types in practice, because they are consumed and applied to the objects they are targeting.