A clang-based helper tool to rewrite a header-based C++ project into a module-based one.
The tool provides three styles to rewrite the project:
- Rewrite a header-based project to make it provide the module interfaces, but still keep the project as header-based one. Which is known as header wrappers style.
- Rewrite a header-based project into a module-based one, but still provide the original headers. This style is helpful when we decide to develop new features in modules and want to keep the old interfaces.
- Rewrite a header-based project into a module-based one completely without keeping the headers. This may be wanted if no users of the project are expected to not use headers anymore.
Note this tool is not expected to run continuously like clang-format. This tool is only expected to save some trivial works to refactor the projects.
The small examples can be found in clang/test/ClangModulesConverter
and
a real world example can be found in Header Wrapper for async_simple.
This project lives in truncated clang/LLVM's source tree. We can built it by a similar way to build clang from source.
An example maybe:
cmake -DLLVM_ENABLE_PROJECTS="clang;" -DCMAKE_BUILD_TYPE=Release ../llvm
make clang-modules-converter -j
The core ability of the tool is to find the preamble section for each file in the project. The preamble section is the section to introduce dependency of the file. For example,
// my_lib.h
// Start of preamble
#include <iostream>
#include <string>
#ifdef USE_MAP
#include <map>
#endif
#include "local_headers.h"
// End of preamble
namespace my_lib {
...
}
Then we can manipulate the preamble section:
// my_lib.h
#ifndef MY_LIB_USE_MODULES
#include <iostream>
#include <string>
#ifdef USE_BOOST
#include <boost/...>
#endif
#include "local_headers.h"
#endif // MY_LIB_USE_MODULES
namespace my_lib {
...
}
So the header will only contain its own body when MY_LIB_USE_MODULES
is defined. Then we can export its body in the module interface:
export module my_lib;
import std; // we detected use of std module
import boost; // we detected use of boost module
#define MY_LIB_USE_MODULES
export extern "C++" {
#include "my_lib.h"
}
Similarly, for source files, after we detected the preamble, we can replace the preamble of including headers to import corresponding modules.
The tool only accepts one argument --config=
to the path of the config file.
The tool needs a yaml config file to work. The fields for the config file are:
root_dir
. Optional. The root of the project. If not provided, the path of the config file is considered to be the root of the project.modules
. A list of modules we wish to generate. The fields of each module are:name
. The name of the module to be generated.path
. The path to the module interface unit to be generated.headers
. A string or a list of strings to describe the path of headers which we wish to rewrite and grab their contents to the module interface.excluded_headers
. A string or a list of strings to describe the path of headers that shouldn't be in the module interface.srcs
. A string or a list of strings to describe the source files which will be rewritten into module implementation units for the module.excluded_srcs
. A string or a list of strings to describe the path of source files which wouldn't be module implementation units.prefix_map_of_module_units_for_headers
. A string or a list of strings to describe the map for the prefix of the path of the generated module units. Each string should contain a:
symbol. And the path starts with the LHS of:
will be mapped to the RHS of:
. This is not meaningful to header wrappers mode. Without the field, the module units for the headers are generated in the same directory, but if the headers live ininclude
directory and we wish the generated module units to live inmodule
directory, we can specify the value ofprefix_map_of_module_units_for_headers
asinclude:module
.
third_party_modules
. A list of third party (existing) modules. The fields are:name
. The name of the third-party module.headers
. A string or a list of string. Each string is a regex to describe the included text.
srcs_to_rewrite
. A string or a list of strings to describe the path of sources (generally the files end with.cpp
and.cc
, not headers.) we wish to rewrite to use modules.*
and**
wildcards are supported.srcs_excluded_to_rewrite
. A string or a list of strings to describe the path of sources we don't want to rewrite. All sources matched won't be rewritten.mode
. Required. The working mode of the tool. Support values are:header-wrapper
. The tool will try to rewrite the headers and generate the corresponding modules.rewrite-headers-to-module-units
. The tool will try to rewrite the headers to module units. Then it is the job of users to control the visibility of these module units.rewrite-headers-to-partitions
. The tool will try to rewrite the headers to module partitions of the corresponding module.
controlling_macro
. A string to describe the modules controlling macro. We use this macro to detect if we're in modules for headers.remain_headers
. A boolean value. By default true. If it is true and themode
is notheader-wrapper
, the tool will try to embed the body of the headers into the corresponding module units and remove unused headers.keep_traditional_abi
. A boolean value. By default true. If it is false, the tool will try to generate codes within modules directly, which will be in modules ABI.is_std_module_available
. Required. A boolean value. If thestd
module is available in the environments. If not, the tool will try to generate a std module for you since the std module is pretty important in the ecosystem of an ideal modular world.std_module_path
. A string to describe the path of the generated std module. Only meaningful whenis_std_module_available
is false.compilation_database
. A string to describe the path to the compile commands of the project. The tool will use the information to preprocess the whole projects.default_compile_commands
. A string to describe the default commands ifcompilation_database
is not available or we failed to find some files incompilation_database
.
An example maybe:
modules:
- name: my_module
path: modules/async_simple.cppm
headers: includes/**/*.h
excluded_headers:
- "**/test/**"
third_party_modules:
- name: boost
headers: boost
mode: header-wrapper
controlling_macro: MY_LIB_USE_MODULES
is_std_module_available: false
srcs_to_rewrite: "**/srcs/*.cc"
srcs_excluded_to_rewrite: "**/test/**"
compilation_database: build/compile_commands.json
std_module_path: third_party/std.cppm
A version controlling tool (e.g., git) is expected when using this tool.
A real world example can be found in Header Wrapper for async_simple.
Abstractly, the workflow of the tool may be:
- Build the projects first and get the compilation commands.
- Write the config file according to the information of the project.
- Invoking
clang-modules-converter
to rewrite the projects initially. - Write the modules related things in the build systems.
- Try to build the modules and fix bugs (highly possible).
- Refine and polish the generated codes.
The error types I've seen includes:
- The use of macros.
- A lot of kinds of error due to implicit or missing includes.
For macros, this tool will try to recognize the macro used and give a warning diagnostic message for this, like:
// There unhandled macro uses found in the body:
// 'assert' defined in /usr/include/assert.h:50:10
We hope such information can ease the process of rewriting.
For the example of implicit or missing includes,
// a.h
inline int a() { ... }
// b.h
// missing including a.h !
inline int b() { return a(); }
We forgot including a.h
in b.h
but we didn't find this because we always include a.h
before including b.h
. Then this tool may get an invalid dependency information and generates codes that include b.h
before a.h
.
And also,
// a.h
#include "third_party1/..." // third_party1 includes third_party2
inline int a() {
third_party2 xxx;
}
in this case, since a.h
doesn't include headers from third_party2 directly, the tool may not generate codes to import third_party2.
Since I hope to merge this tool into clang/LLVM and the current form looks easier to merge it.
The std module is pretty important in the ideal modular ecosystem. In the ideal modular world, every declaration should live in a single module file. Otherwise we may not get best compile-time performance. But it may not be achieveable without the std module. So this tool tried to generate the std module when it is not available to emphasize the importance of std module.
Since the header defining the used macros may not be intended to be used directly. For example, we have an interface header interface.h
, and its contents was implemented in several "implementation" headers. But the intention was to make users to include the interface.h
instead of the corresponding actual file that defining the macros.
The other point is that it will be better to split headers to define macros only instead of copying the headers directly. For example,
// Common.h
// Sections to define macros
#define ...
#define ...
// Sections to define common functions and classes.
...
Then it will be better to split a CommonMacro.h
from the Common.h
file. So that the users that depend on the macros may not need to include Common.h
to introduce unnecessary entities.
The tool will generate the std module on demand. It will scan the use of
std headers in your project and generate a mock std module-based on the use
of std headers. In another word, if you only use <vector>
, the generated
std module may not contain <set>
.
The key reason to do this is, only the standard library vendors themselves can understand what they provide, even if we have a specification. If we generate the std module-based on the specification only, then we will get many errors like, missing header <stacktrace>
, missing header <span>
...
Another point is compatibility. The C++'s ecosystem allows us to mix the use of standard library and the compilers. e.g. we are allowed to use clang with libstdc++. But because they are in different projects, there might be some gaps to match new features. e.g., there was a time that clang couldn't compile the <concept>
header provided by libstdc++.
We can adjust it ourselves.
If the tool failed to treat some std headers, we can add it ourselves by adding it like a third-party module:
third_party_modules:
- name: std
headers: "^new_std_header$"
Note that we need to add ^
and $
otherwise it may match local/new_std_header
, which may not be wanted.
And if the tool doesn't export needed names, we can update the generated std module directly. Or update the tool itself.
When I develop this tool, I tried to make it portable intentionally. But I admit I never test or build this on Windows. So there might be some mismatches. Feedback and contributions are highly welcomed.
In this case, we can put the following text in these headers:
#ifndef ASYNC_SIMPLE_USE_MODULES
#endif // ASYNC_SIMPLE_USE_MODULES
Because we won't rewrite the headers that was already converted. The rationale is, if the tool detected such pattern, we assume it is edited or verified by users. Then we should respect users' decisions.
It is more natural to rewrite headers that consisting of a module interface into partitions. The codes look prettier, and we can get language-level feature to control the visibility.
But the downside is, all the interface partition units should be transitively imported to the primary module interface and all the implementation module units would import the primary module interface. So that all the implementation module units need to wait for the all the interfaces to be compiled to start their own compilations.
If we don't like this, we can rewrite the headers to different modules and control the visibilities in the build systems to make sure the users can only use the module we intended to export.
All the tests are in clang/test/ClangModulesConverter
. The implementation codes are in clang/tools/clang-modules-converter
. The main file is clang/tools/clang-modules-converter/ModulesConverter.cpp
.
The tool has 4 functional modules:
- Yaml config file parsing. Implemented in
clang/tools/clang-modules-converter/ConverterConfig.cpp
. - Interesting file recognizing. Implemented in
clang/tools/clang-modules-converter/InterestingFile.cpp
. - (Pre)Processing the targeted project to get the information. Implemented in
clang/tools/clang-modules-converter/ProcessInterestingFiles.cpp
. - Rewriting. Implemented in
Rewriter.cpp
.- Additionally, the codes to generate std module is in
clang/tools/clang-modules-converter/StdModuleGenerator.cpp
.
- Additionally, the codes to generate std module is in