/clcache

A compiler cache for MSVC, much like ccache for gcc

Primary LanguagePythonOtherNOASSERTION

clcache.py - a compiler cache for Microsoft Visual Studio

clcache.py is a little Python script which attempts to avoid unnecessary recompilation by reusing previously cached object files if possible. It is meant to be called instead of the original 'cl.exe' executable. The script analyses the command line to decide whether source code is to be compiled. If so, a cache will be queried for a previously stored object file.

If the script is called in an unsupported way (e.g. if the compiler is called for linking), the script will simply relay the invocation to the real 'cl.exe' program.

Build status

Installation

Please see the README for instructions on how to install clcache and different approaches on how to integrate it into a build system.

Options

--help

Print usage information

-s

Print some statistics about the cache (cache hits, cache misses, cache size etc.)

-c

Clean the cache: trim the cache size to 90% of its maximum by removing the oldest objects.

-C

Clear the cache: remove all cached objects, but keep the cache statistics (hits, misses, etc.).

-z

Reset the cache statistics, i.e. number of cache hits, cache misses etc.. Doesn’t actually clear the cache, so the number of cached objects and the cache size will remain unchanged.

-M <size>

Sets the maximum size of the cache in bytes. The default value is 1073741824 (1 GiB).

compiler

It is, optionally, possible to specify the full path to the compiler as the first argument on the command line, in the style of ccache, instead of using the CLCACHE_CL environment variable or searching the path for cl.exe

Environment Variables

CLCACHE_DIR

If set, points to the directory within which all the cached object files should be stored. This defaults to %HOME%\clcache

CLCACHE_CL

Can be set to the actual 'cl.exe' executable to use. If this variable is not set, the 'clcache.py' script will scan the directories listed in the PATH environment variable for 'cl.exe'. In case this is just a file name (as opposed to an absolute path), 'clcache.py' will scan the directories mentioned by the %PATH% environment variable to compute the absolute path.

CLCACHE_LOG

If this variable is set, a bit of diagnostic information is printed which can help with debugging cache problems.

CLCACHE_DISABLE

Setting this variable will disable 'clcache.py' completely. The script will relay all calls to the real compiler.

CLCACHE_HARDLINK

If this variable is set, cached object files won’t be copied to their final location. Instead, hard links pointing to the cached object files will be created. This is more efficient (faster, and uses less disk space) but doesn’t work if the cache directory is on a different drive than the build directory.

CLCACHE_COMPRESS

If true, clcache will compress object files it puts in the cache. If the cache was filled without compression it can’t be used with compression and vice versa (i.e. you have to clear the cache when changing this setting). The default is false.

CLCACHE_COMPRESSLEVEL

This setting determines the level at which clcache will compress object files. It only has effect if compression is enabled. The value defaults to 6, and must be no lower than 1 (fastest, worst compression) and no higher than 9 (slowest, best compression).

CLCACHE_NODIRECT

Disable direct mode. If this variable is set, clcache will always run preprocessor on source file and will hash preprocessor output to get cache key. Use this if you experience problems with direct mode or if you need built-in macros like _TIME_ to work correctly.

CLCACHE_BASEDIR

Has effect only when direct mode is on. Set this to path to root directory of your project. This allows clcache to cache relative paths, so if you move your project to different directory, clcache will produce cache hits as before.

CLCACHE_OBJECT_CACHE_TIMEOUT_MS

Overrides the default ObjectCacheLock timeout (Default is 10 * 1000 ms). The ObjectCacheLock is used to give exclusive access to the cache, which is used by the clcache script. You may override this variable if you are getting ObjectCacheLockExceptions with return code 258 (which is the WAIT_TIMEOUT return code).

CLCACHE_PROFILE

If this variable is set, clcache will generate profiling information about how the runtime is spent in the clcache code. For each invocation, clcache will generate a file with a name similar to 'clcache-<hashsum>.prof'. You can aggregate these files and generate a report by running the 'showprofilereport.py' script.

CLCACHE_SERVER

Setting this environment variable will make clcache use (and expect) a running clcachesrv.py script which takes care of caching file hashes. This greatly improves performance of cache hits, but only has an effect in direct mode (i.e. when CLCACHE_NODIRECT is not set).

CLCACHE_MEMCACHED

This variable can be used to make clcache use a memcached[https://memcached.org/] backend for saving and restoring cached data. The variable is assumed to hold the host and port information of the memcached server, e.g. 127.0.0.1:11211.

Known limitations

How clcache works

clcache.py was designed to intercept calls to the actual cl.exe compiler binary. Once an invocation has been intercepted, the command line is analyzed for whether it is a command line which just compiles a single source file into an object file. This means that all of the following requirements on the command line must be true:

  • The /link switch must not be present

  • The /c switch must be present

  • The /Zi switch must not be present (/Z7 is okay though)

If multiple source files are given on the command line, clcache.py wil invoke itself multiple times while respecting an optional /MP switch.

If all the above requirements are met, clcache forwards the call to the preprocessor by replacing /c with /EP in the command line and then invoking it. This will cause the complete preprocessed source code to be printed. clcache then generates a hash sum out of

  • The complete preprocessed source code

  • The `normalized' command line

  • The file size of the compiler binary

  • The modification time of the compiler binary

The `normalized' command line is the given command line minus all switches which either don’t influence the generated object file (such as /Fo) or which have already been covered otherwise. For instance, all switches which merely influence the preprocessor can be skipped since their effect is already implicitly contained in the preprocessed source code.

Once the hash sum is computed, it is used as a key (actually, a directory name) in the cache (which is a directory itself). If the cache entry exists already, it is supposed to contain a file with the stdout output of the compiler as well as the previously generated object file. clcache will copy the previously generated object file to the designated output path and then print the contents of the stdout text file. That way, the script behaves as if the actual compiler was invoked.

If the hash sum is not yet used in the cache, clcache will forward the invocation to the actual compiler. Once the real compiler successfully finished its work, the generated object file (as well as the output printed by the compiler) is copied to the cache.

Caveats

No cache hits when building via Visual Studio IDE or MSBuild

Various people (see e.g. GitHub issues #33 or #135) reported that they do not see any cache hits when running clcache via the MSBuild tool, which is the build tool executed by the Visual Studio IDE. The symptom is that a clean rebuild, or just cleaning, a project and then rebuilding does not cause any cache hits even though nothing changed.

The reason for this is that the CL Task used by MSBuild has a feature which makes it track all files written while executing a task, and when cleaning the project all those files are deleted. Alas, this also causes any cached files created by clcache to be tracked and hence deleted. The documentation explains:

[..] TLogFileWrites - Optional ITaskItem[] parameter. Specifies an array of items that represent the write file tracking logs. A write-file tracking log (.tlog) contains the names of the output files that are written by a task, and is used by the project build system to support incremental builds. For more information, see the TrackerLogDirectory and TrackFileAccess parameters in this table. [..]

TrackFileAccess - Optional Boolean parameter. If true, tracks file access patterns. For more information, see the TLogReadFiles and TLogWriteFiles parameters in this table.

To fix this, open the .vcxproj file of your project and extend (or add) the Globals property group such that the TrackFileAccess parameter is set to false:

<PropertyGroup Label="Globals"> …​ <TrackFileAccess>false</TrackFileAccess> </PropertyGroup> If you don’t want to modify these properties in your .vcxproj file you pass them while invoking MSBuild directly. Other useful properties in combination with clcache are /p:CLToolExe=clcache.exe /p:CLToolPath=c:\path\to\the\clcache

msbuild.exe /p:TrackFileAccess=false /p:CLToolExe=clcache.exe /p:CLToolPath=c:\path\to\the\clcache

Race conditions when writing to .tlog files

The file tracking functionality of Visual Studio mentioned earlier can cause a different symptom which causes an error message to be written to the standard output looking like

FileTracker : error FTK1011: could not create the new file tracking log file: […​].1.tlog. The file exists.

This appears to be a known defect in MSbuild; the workaround is to disable file access tracking as described above.

clcachesrv prevents deletion of directories containing include files for which hash sums are cached

The way in which the clcachesrv server process for caching hash sums of include file works prevents that the directories containing such include files cannot be deleted anymore since clcachesrv monitors the file system to watch those files for changes (in order to invalidate the cached hash sum). See this comment for some internal details on what’s going on.

To work around this problem, an --exclude argument can be passed to the clcachesrv to instruct it to not bother caching the hash sums of files in certain paths. The argument takes a regular expression (hence, special characters need to be escaped) and is used like

$ python clcachesrv.py --exclude \\build\\

Usually, there is no benefit in caching hash sums of file sin build directories - instead, just the include files of standard libraries (e.g. the C++ library or common 3rd party libraries) need to be considered.

Changes to INCLUDE and LIBPATH environment variables are not detected

If the INCLUDE (for #include statements) or LIBPATH (for #using statements) environment variables are changed between compilations clcache will not notice, and may erroneously return a cached object file that was compiled with a different settings. The most likely reason for a change in these variables is switching between different installations of Visual Studio.

Workrounds include:

clearing the cache when changing the variables setting CLCACHE_NODIRECT. This will force clcache to run the preprocessor and base the caching on its output. The preprocessor will respond correctly to changes in INCLUDE. Note that this only handles changes to INCLUDE (but if your code doesn’t use #using, that is all you care about).

License Terms

The source code of this project is - unless explicitly noted otherwise in the respective files - subject to the BSD 3-Clause License.

Credits

clcache.py was written by Frerich Raabe with a lot of help by Slava Chigrin, Simon Warta, Tim Blechmann, Tilo Wiedera and other contributors.

This program was heavily inspired by ccache, a compiler cache for the GNU Compiler Collection.