/pathological

Like Rust's std::path::Path, but system-aware.

Primary LanguageRustApache License 2.0Apache-2.0

camino - UTF-8 paths

camino on crates.io crates.io download count Documentation (latest release) Documentation (main) License License

This repository contains the source code for camino, an extension of the std::path module that adds new Utf8PathBuf and Utf8Path types.

What is camino?

camino's Utf8PathBuf and Utf8Path types are like the standard library's PathBuf and Path types, except they are guaranteed to only contain UTF-8 encoded data. Therefore, they expose the ability to get their contents as strings, they implement Display, etc.

The std::path types are not guaranteed to be valid UTF-8. This is the right decision for the standard library, since it must be as general as possible. However, on all platforms, non-Unicode paths are vanishingly uncommon for a number of reasons:

  • Unicode won. There are still some legacy codebases that store paths in encodings like Shift JIS, but most have been converted to Unicode at this point.
  • Unicode is the common subset of supported paths across Windows and Unix platforms. (On Windows, Rust stores paths as an extension to UTF-8, and converts them to UTF-16 at Win32 API boundaries.)
  • There are already many systems, such as Cargo, that only support UTF-8 paths. If your own tool interacts with any such system, you can assume that paths are valid UTF-8 without creating any additional burdens on consumers.
  • The "makefile problem" asks: given a Makefile or other metadata file (such as Cargo.toml) that lists the names of other files, how should the names in the Makefile be matched with the ones on disk? This has no general, cross-platform solution in systems that support non-UTF-8 paths. However, restricting paths to UTF-8 eliminates this problem.

Therefore, many programs that want to manipulate paths do assume they contain UTF-8 data, and convert them to strs as necessary. However, because this invariant is not encoded in the Path type, conversions such as path.to_str().unwrap() need to be repeated again and again, creating a frustrating experience.

Instead, camino allows you to check that your paths are UTF-8 once, and then manipulate them as valid UTF-8 from there on, avoiding repeated lossy and confusing conversions.

Examples

The documentation for Utf8PathBuf and Utf8Path contains several examples.

For examples of how to use camino with other libraries like serde and clap, see the camino-examples directory.

API design

camino is a very thin wrapper around std::path. Utf8Path and Utf8PathBuf are drop-in replacements for Path and PathBuf.

Most APIs are the same, but those at the boundary with str are different. Some examples:

  • Path::to_str() -> Option<&str> has been renamed to Utf8Path::as_str() -> &str.
  • Utf8Path implements Display, and Path::display() has been removed.
  • Iterating over a Utf8Path returns &str, not &OsStr.

Every Utf8Path is a valid Path, so Utf8Path implements AsRef<Path>. Any APIs that accept impl AsRef<Path> will continue to work with Utf8Path instances.

Should you use camino?

camino trades off some utility for a great deal of simplicity. Whether camino is appropriate for a project or not is ultimately a case-by-case decision. Here are some general guidelines that may help.

You should consider using camino if...

  • You're building portable, cross-platform software. While both Unix and Windows platforms support different kinds of non-Unicode paths, Unicode is the common subset that's supported across them.
  • Your system has files that contain the names of other files. If you don't use UTF-8 paths, you will run into the makefile problem described above, which has no general, cross-platform solution.
  • You're interacting with existing systems that already assume UTF-8 paths. In that case you won't be adding any new burdens on downstream consumers.
  • You're building something brand new and are willing to ask your users to rename their paths if necessary. Projects that don't have to worry about legacy compatibility have more flexibility in choosing what paths they support.

In general, using camino is the right choice for most projects.

You should NOT use camino, if...

  • You're writing a core system utility. If you're writing, say, an mv or cat replacement, you should not use camino. Instead, use std::path::Path and add extensive tests for non-UTF-8 paths.
  • You have legacy compatibility constraints. For example, Git supports non-UTF-8 paths. If your tool needs to handle arbitrary Git repositories, it should use its own path type that's a wrapper around Vec<u8>.
  • There's some other reason you need to support non-UTF-8 paths. Some tools like disk recovery utilities need to handle potentially corrupt filenames: only being able to handle UTF-8 paths would greatly diminish their utility.

Optional features

By default, camino has no dependencies other than std. There are some optional features that enable dependencies:

Rust version support

The minimum supported Rust version (MSRV) for camino with default features is 1.34. This project is tested in CI against the latest stable version of Rust and the MSRV.

  • Stable APIs added in later Rust versions are supported either through conditional compilation in build.rs, or through backfills that also work on older versions.
  • Deprecations are kept in sync with the version of Rust they're added in.
  • Unstable APIs are currently not supported. Please file an issue on GitHub if you need an unstable API.

camino is designed to be a core library and has a conservative MSRV policy. MSRV increases will only happen for a compelling enough reason, and will involve at least a minor version bump.

Optional features may pull in dependencies that require a newer version of Rust.

License

This project is available under the terms of either the Apache 2.0 license or the MIT license.

This project's documentation is adapted from The Rust Programming Language, which is available under the terms of either the Apache 2.0 license or the MIT license.