A modern C++23 implementation of a tar archive reading library, designed for embedded Linux systems with memory constraints while providing a clean, type-safe API.
- C++23 Implementation: Leverages modern C++ features for safety and performance
- POSIX ustar Support: Full support for standard POSIX.1-1988 tar format
- GNU tar Extensions: Support for GNU long filenames, link targets, and sparse files
- Memory Efficient: Support for both streaming and memory-mapped access patterns
- Type Safe: Uses
std::expectedfor error handling and concepts for type constraints - Zero-Copy: Memory-mapped files provide zero-copy data access where possible
- Embedded Friendly: No dependencies (aside from the test framework) and predictable memory usage
- Sparse File Support: Efficient handling of sparse files with hole detection
#include <tierone/tar/tar.hpp>
// Open a tar archive
auto reader = tierone::tar::open_archive("archive.tar");
if (!reader) {
std::println(stderr, "Failed to open: {}", reader.error().message());
return;
}
// Iterate through entries
for (const auto& entry : *reader) {
std::println("{} ({} bytes)", entry.path(), entry.size());
if (entry.is_regular_file()) {
// Read file data
auto data = entry.read_data();
if (data) {
// Process file content
}
}
}// Extract specific entries
for (const auto& entry : reader) {
if (entry.path().string().starts_with("docs/")) {
auto dest_path = std::filesystem::path("extracted") / entry.path();
if (auto result = entry.extract_to_path(dest_path); !result) {
std::println(stderr, "Failed to extract: {}", result.error().message());
}
}
}The library supports multiple stream types:
file_stream: Standard file I/O using buffered reads (portable)memory_mapped_stream: Operates on pre-loaded memory data (portable)mmap_stream: Zero-copy memory-mapped file access using mmap() (Linux-only)
- C++23 compatible compiler (Clang 19+ or 20+ recommended)
- CMake 3.25+ (for building)
- Catch2 3.6.0 (for building tests, fetched via CMake)
- Linux (tested on Ubuntu 22.04+)
- Other POSIX systems (should work, limited testing)
- Clang 19+: Full support, CI tested
- Clang 20+: Full support, CI tested
- GCC: Untested, GCC-14 and GCC-15 should work
CMake toolchains are optional but provide control on what compiler to use.
cmake -B cmake-build-debug -S . \
-DCMAKE_TOOLCHAIN_FILE=toolchainfile-amd64-clang20.cmake \
-DCMAKE_BUILD_TYPE=Debug
cmake --build cmake-build-debugctest --test-dir cmake-build-debugMultiple examples can be found under the examples directory.
See examples/README.md
- Uses
std::expected<T, error>throughout for composable error handling - Rich error information with context messages
- No exceptions for predictable behavior
- RAII throughout, no manual memory management
- Configurable buffer sizes for memory-constrained environments
- Support for both streaming (constant memory) and random access patterns
- Strong typing for entry types, permissions, and metadata
- Concepts used for template constraints
std::filesystem::pathfor proper path handling
- Zero-copy operations where possible
- Memory-mapped I/O for large archives
- Lazy loading of entry data
- Single-pass streaming for memory efficiency
- Regular files
- Directories
- Symbolic links
- Hard links
- Character devices
- Block devices
- FIFOs
- Long filenames (>100 characters) via 'L' type entries
- Long link targets (>100 characters) via 'K' type entries
- Automatic detection and processing of GNU tar format
✅ Completed:
- POSIX ustar header parsing and validation
- GNU tar long filename/linkname support (L/K entries)
- Streaming archive reader with iterator support
- Memory-mapped and file-based streams
- Entry extraction to filesystem
- Comprehensive test suite
- Example applications demonstrating both POSIX and GNU formats
- GNU sparse file support (0.0 and 1.0)
🚧 Future Enhancements:
- Archive writing capabilities
- If this receives enough requests. On embedded Linux the most common use case is reading and extracting.
The library is organized into several key components:
- Core Types (
error.hpp,metadata.hpp): Error handling and data structures - Streams (
stream.hpp): Abstract interfaces for data access - Header Parsing (
header_parser.hpp): POSIX ustar format parsing - GNU tar Support (
gnu_tar.hpp): GNU tar extension handling - Archive Reader (
archive_reader.hpp): Main API for reading archives - Archive Entry (
archive_entry.hpp): Individual file/directory entries
Licensed under the Apache License, Version 2.0. See LICENSE-2.0.txt for the full license text.
This implementation follows modern C++ best practices and is designed to be a clean, safe alternative to traditional C-based tar libraries like libtar.
We welcome contributions to TierOne Tar! Here's how you can help:
- Fork the repository on GitHub
- Create a feature branch from
master - Make your changes with appropriate tests
- Ensure all tests pass and code follows the existing style
- Submit a pull request with a clear description of your changes
- Follow the existing code style and naming conventions
- Add tests for new features or bug fixes
- Update documentation for API changes
- Use modern C++ features appropriately
- Ensure compatibility with supported compilers (Clang 19+)
Please report bugs, feature requests, or questions by opening an issue on GitHub.
Copyright 2025 TierOne Software. All rights reserved.