rust-lang/rust

Tracking Issue for proc_macro::{tracked_env, tracked_path}

m-ou-se opened this issue ยท 28 comments

Feature gate: #![feature(proc_macro_tracked_env, track_path)]

This is a tracking issue for proc_macro::tracked*, to allow adding files and environment variables to the build system's dependency tracking.

Public API

// proc_macro

mod tracked_env {
    pub fn var<K: AsRef<OsStr> + AsRef<str>>(key: K) -> Result<String, VarError>;
}

mod tracked_path {
    pub fn path<P: AsRef<str>>(path: P);
}

Steps / History

  • Implementation of tracked_env::var: #74653
  • Implementation of tracked_path::path: #84029
  • Add tracking issue: #99516
  • Change path to accept AsRef<Path> instead of AsRef<str>
  • Design the public API
  • Documentation
  • Final comment period (FCP)
  • Stabilization PR

Unresolved Questions

  • The full API design.
    • The implementaiton PRs were mostly focussed on the implementation details behind the scenes of proc_macro, but there's barely been any discussion on the public interface.

This feature is not in a great state. The original PRs were mostly about the implementation details, with almost no discussion on the public API. There's still many unanswered design questions, such as naming, whether path should return something (a File? a String? a TokenStream? a PathBuf? nothing?), how it should be documented, how these functions should relate to each other, whether they should report errors through panics or through Result, whether it should take AsRef<OsStr> or AsRef<str>, and so on.

Instead of discussing every single detail separately, it'd probably be best to start with a more complete proposal for how these APIs should look, and then discuss that.

(Usually most of this happens before an unstable feature is merged and tracked, but this has already been merged, hence this tracking issue.)

Considering these APIs effectively mirror the cargo:rerun-if-changed and cargo:rerun-if-env-changed functionality from build scripts I think it's reasonable to assume they don't return anything but just register interest.

Right now the main reason you run into this issue is because someone already has functionality in a build script and wants to convert it into a proc_macro. I would also strongly recommend for it to not report an error if the file does not exist as absence and presence of a file is both legitimate for proc macros the same way as it works for build scripts again.

Considering these APIs effectively mirror the cargo:rerun-if-changed and cargo:rerun-if-env-changed functionality from build scripts I think it's reasonable to assume they don't return anything but just register interest.

I don't think that is a good argument. It is relatively common to forget to register interest with build scripts I believe. In addition it is pretty common to write helper functions that register interest and at the same time read the file/env var to prevent forgetting this. The reason that no such function exists by default for build scripts is because there is no way for cargo to provide a crate to build scripts with said functions. Furthermore returning the value is essential for making proc macros work with virtual filesystems and with compiler session local env vars.

It is relatively common to forget to register interest with build scripts I believe

Maybe? But that is probably a very different argument to begin with. Today where I'm standing there is functionality available with build.rs that you cannot have with proc_macro and the same arguments for the use of it with build.rs are equally appying to proc macros.

In addition it is pretty common to write helper functions that register interest and at the same time read the file/env var to prevent forgetting this.

By coupling these things together this has a high change of turning into a never ending argument because there will always be a disagreement about how this should work. In fact, even today you will find no agreement on how to best use the build script equivalents and if we had to come to an agreement about his this should work there, we probably wouldn't have solution there today with build scripts.

If there is a need for such an API it can always be added at a later point.

The reason that no such function exists by default for build scripts is because there is no way for cargo to provide a crate to build scripts with said functions.

Of course there is. It would always have been possible for cargo to ship a utility library that is linked against build scripts the same way as proc_macro exposes itself.

Furthermore returning the value is essential for making proc macros work with virtual filesystems and with compiler session local env vars.

That same argument can be brought up for build scripts.

If there is a need for such an API it can always be added at a later point.

The interest registering version can be trivially implemented in terms of the version that returns the value by discarding the return value, but not the other way around. In addition the interest registering versions may become entirely useless if we start sandboxing proc macros depending on the way we do this.

That same argument can be brought up for build scripts.

For build scripts it isn't all that important that changing a file in an editor without saving it reruns them. It is essential for proc macros. In addition rust-analyzer is already having a lot of trouble with proc macros going outside of their implied (unenforced) sandbox, while it doesn't have any issues with build scripts as those run in the same environment as with a regular cargo build due to rust-analyzer using cargo for invoking them.

I want to leave another piece of input here that I believe are relevant: rerun-if-changed today can track a directory and will in fact track changes to the contents. My expectation as a user would be that I can replicate the same functionality for proc_macros. This is in fact right now what I care about was I'm trying to build something that bundles folders.

The interest registering version can be trivially implemented in terms of the version that returns the value by discarding the return value, but not the other way around.

If returning contents turns out to be useful it can always be added as an additional API. I think coupling these things will make this feature stay in not stable land for a lot longer than if those discussions are separated.

est31 commented

I think tracked_path::path should be redesigned to also open read access to the specified path, as suggested by RFC 3200, because the current API leaves that question unresolved, which basically begs for the proc macro to use outside methods to access files, which would be a break of current policy that this is unsupported.

Just a note here that the current implementation does not work with relative paths. It works if I use absolute ones though.

ehuss commented

@wdanilo Can you put together a reproduction? It seems to work fine for me. The path will be relative to cwd where rustc is run, which for Cargo is the workspace root (which matches the way a proc-macro would access relative paths).

@ehuss Not this kind of relative path, sorry for not being clear enough here. In my case it was something along the lines of /some/absolute/path/.../.../../../some/other/file and it did not work until I run canonicalize on my PathBuf here. Could you tell me if that was enough for you to repro it, or should I try creating a minimal scene, please? :)

ehuss commented

Yes, please create a minimal reproduction if you can. Also, please post it as a new issue (and cc me). Tracking issues are not a good way to discuss individual issues.

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

first, include_bytes! doesn't return &str, it returns &[u8; N].

second, afaict this would make rustc recompile the output of the proc-macro, not necessarily rerun the proc-macro itself.

djc commented

Can I imitate the behaviors of these two APIs by adding const _: Option<&str> = option_env!("tracked_env"); and const _: &str = include_bytes!("tracked_path"); to returned token streams? Is it guaranteed that the env and the path are tracked even if they do not really make a difference to the program?

second, afaict this would make rustc recompile the output of the proc-macro, not necessarily rerun the proc-macro itself.

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

I'm pretty sure the macros get rerun if the file used for include_bytes!() is changed. I've been using this as a trick to avoid having to have build scripts that explicitly rerun the build when template files change in Askama.

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

I don't think we need to worry about that: include_bytes, via load_binary_file, adds the file to a list of dependencies:

let file = self.new_source_file(path.to_owned().into(), text);

So as I understand it, the file is tracked for changes explicitly, and any relevant caches (if any) should be invalidated automatically. Everything else would break incremental builds fundamentally.

I wouldn't depend on that since rustc in incremental mode likes to cache stuff on disk and it's plausible to cache proc-macro outputs.

I don't think we need to worry about that: include_bytes, via load_binary_file, adds the file to a list of dependencies:

So as I understand it, the file is tracked for changes explicitly, and any relevant caches (if any) should be invalidated automatically.

the problem isn't that include_bytes won't be rerun, but that whatever macro expanded to include_bytes may not be rerun, since that macro that produced include_bytes didn't tell the compiler that it depends on the file, and not merely whatever macros are in its output. The compiler is allowed to cache the intermediate output of running the proc macro and just rerun include_bytes itself, instead of being required to rerun both the proc-macro and include_bytes. Idk if the compiler currently behaves that way, but I see no reason it shouldn't be allowed to.

the problem isn't that include_bytes won't be rerun, but that whatever macro expanded to include_bytes may not be rerun, since that macro that produced include_bytes didn't tell the compiler that it depends on the file, and not merely whatever macros are in its output.

Ok, that may be true, I have no idea how the compiler tracks which files affect the results of a proc macro. Wouldn't that affect cargo:rerun-if-changed in the same way? Or does that implicitly invalidate all cached proc macro outputs whenever any of the files change?

Apart from tracked_path and rustc does not track which files affect the proc macro output at all. It currently reruns all proc macros every time rustc runs, though in the future it will probably start to cache proc macros in the incremental compilation cache. And cargo reruns rustc only when any of the files mentioned in the dep info file change. tracked_path adds an entry for the given file to the dep info file.

djc commented

though in the future it will probably start to cache proc macros in the incremental compilation cache.

Do you know if this is tracked somewhere? Would be great to have.

I don't know of a tracking issue for this, but #125356 would one step towards this (caching decl macros) There have been discussions in a variety of places including the internals forum and zulip.

Hi, I have some time, and wanted to see if I can help out on this effort to cache proc macro expansions :)

Having seen https://www.coderemote.dev/blog/faster-rust-compiler-macro-expansion-caching/ (their changes are closed source though), which found that up to 40% of incremental build times are sometimes due to proc macro expansion, I also did some profiling on a project of mine that uses bevy (which has a lot of custom derives), and found that proc macro expansion also makes up for 40% of my incremental build time.

Additionally there was around 20% of time spent in codegen afterwards, which I thought might also go down a bit with proc macro caching, but I have not measured this(!). I think the 40% are worth it anyway, even if no additional (codegen) benefits show up in the end.

Ongoing Progress I found:

  • #87173 is an open rename/refactoring by @drahnr to improve the public facing API. It seems like it mostly needs a (final?) review and maybe a rebase, but is mostly good.

Possible Progress I see:

I did not yet see a way to "opt into" this dependency tracking feature for proc macros (please let me know if I missed it). Cargo has the common cargo::rerun-if-changed=build.rs-pattern for build scripts to signal that only the build script itself is relevant for change detection (link to cargo docs). I think a similar mechanism might be needed for proc macros to opt into tracking/caching, because I think we can't change the behavior of existing/unadapted macros.

Maybe something like proc_macro::track_input_tokens() (name up for bikeshedding) could serve that purpose? In that case, could that also be a first step for stabilization? I.e., as a first step we allow proc macros to opt-in to signaling that they only depend on their input tokens (and their own definition, of course), and anything that relies on files or env-vars will simply not opt-in for now. This would make progress possible and would allows questions such as #99515 (comment) to be solved independently, hopefully making it easier to arrive at an ergonomic solution and API.

Given that this might substantially improve build times for incremental builds, I think it would make sense to try and drive a path forwards here, and would be happy to help/implement something for that :)

I think tracked_path::path should be redesigned to also open read access to the specified path, as suggested by RFC 3200, because the current API leaves that question unresolved, which basically begs for the proc macro to use outside methods to access files, which would be a break of current policy that this is unsupported.

What about macros like include_dir! that would want to enumerate folders and track the folder content for new files?

Ok, taking a closer look at proc-macro caching, it seems like #125356 (mentioned in #99515 (comment)) definitely makes sense as a first step, since the basic machinery for caching decl macros will probably also be relevant for proc-macro caching.

So would be happy to help with that as well, but not sure about the PR status, so I'll inquire there ๐Ÿ‘

@Skgland that's actually a good question. It's a limitation to the entire tracked_path feature I'd say: right now it assumes that it gets files passed to it, instead of arbitrary paths that are either files or directories. The correct behaviour when passed a directory would be, at each build, to walk that directory and all of its children and if any child had modifications, or the list of children is different, then mark it as dirty.

The issue is .d files, which probably don't support this. So integration into build systems like make or others would be poor.

Maybe we should rename path to file to make it clear that we only support files?

In classical unix terminology, directories are special types of "file"s, but we already have the is_file function in our API (where the "file" is an abbreviation for "regular file") so there is precedent already.

Based on rust-lang/rfcs#3200 (comment) I would have though directories are handled as expected.

FWIW, the tracked_path API now supports tracking any changes within a directory. The change happened alongside the buildscript rerun-if getting that functionality to replace the older, mostly useless behavior of watching the directory entry itself for (i.e. metadata) changes.

Also isn't this already something cargo needs to track e.g. cargo build needs to know if new build targets are added i.e. a src/main.rs is added to a previously library only package, or a new file in src/bin.
Though that's probably a concern at a different level.

Edit:

I think that comment was referring to rust-lang/cargo#8973 though not sure how that effects tracked_path as I don't see a corresponding PR for rust itself.

not sure how that effects tracked_path as I don't see a corresponding PR for rust itself.

rustc outputs .d files (write_out_deps function in compiler/rustc_interface/src/passes.rs), which then get read by cargo.

Cargo then calls the mtime_recursive function:

https://github.com/rust-lang/cargo/blob/11dccd1df09b0f50706d72f6e886b67f1f00b732/src/cargo/core/compiler/fingerprint/mod.rs#L1928

So yeah, you are right, directories are handled (and handled recursively).

This would need to be properly handled by rust as well in case proc macro outputs are cached in the future (#99515 (comment)), so that the cache is invalidated if the directory content changes. Doesn't help if cargo reruns rustc as it detects the change, but rustc doesn't detect the change and reuses a cached result.

I think this might be a new case to handle for rustc as currently I don't know of anything that enumerates folders. There is module_name.rs vs. module_name/mod.rs, but all I can think of results in a finite list of files that are checked, rather than enumerating a folder.