fast_whitespace_collapse is a high-performance Rust crate for collapsing consecutive spaces and tabs into a single space.
Uses SIMD (u8x16) via the wide crate for efficient processing.
Automatically falls back to a scalar implementation if SIMD is unavailable.
- Collapses multiple spaces and tabs into a single space.
- Preserves newlines and non-whitespace characters.
- Uses SIMD (
u8x16) when supported to process 16 bytes at a time. - Falls back to a fast scalar implementation if SIMD is unavailable.
- Ensures valid UTF-8 output.
- SIMD requires AVX2, SSE2, or NEON instruction sets.
Add this to your Cargo.toml:
[dependencies]
fast_whitespace_collapse = "0.1.0"Or run the following command:
cargo add fast_whitespace_collapseBy default, SIMD acceleration is enabled. You can control it via Cargo features:
cargo build --no-default-featurescargo build --features simd-optimizeduse fast_whitespace_collapse::collapse_whitespace;
let input = "This is \t a test.";
let output = collapse_whitespace(input);
assert_eq!(output, "This is a test.");- Processes text using SIMD (
u8x16), handling 16 bytes in parallel. - Falls back to scalar processing when SIMD is unavailable.
- Handles large inputs efficiently while maintaining valid UTF-8 output.
| Method | Time |
|---|---|
| Regex approach | 11.289 Β΅s |
| collapse crate | 1.2624 Β΅s |
| Iterative approach | 629.60 ns |
| Iterative bytes | 428.00 ns |
| fast_whitespace_collapse crate | 388.73 ns |
π fast_whitespace_collapse outperforms other methods, achieving the lowest execution time.
π Benchmark executed on Apple M1 Pro (NEON SIMD enabled).
cargo benchfast_whitespace_collapse supports multiple architectures:
- x86_64: Uses SIMD (
SSE2,AVX2) for maximum performance. - ARM (aarch64, M1/M2/M3): Uses NEON SIMD.
- Other: Falls back to a scalar implementation.
use fast_whitespace_collapse::collapse_whitespace;
assert_eq!(collapse_whitespace("Hello world"), "Hello world");
assert_eq!(collapse_whitespace(" Trim spaces " ), "Trim spaces");
assert_eq!(collapse_whitespace("Tabs\t\tconverted"), "Tabs converted");assert_eq!(collapse_whitespace("γγγ«γ‘γ― δΈη"), "γγγ«γ‘γ― δΈη"); // Japanese
assert_eq!(collapse_whitespace("δ½ ε₯½ δΈη"), "δ½ ε₯½ δΈη"); // Chinese
assert_eq!(collapse_whitespace("π π π"), "π π π"); // Emojisassert_eq!(collapse_whitespace("Line1\n Line2\nLine3"), "Line1\n Line2\nLine3");Run tests with:
cargo testThis project is licensed under the MIT License.