retain, used and perhaps wildcard-matching using retain-symbols-file
fwsGonzo opened this issue · 13 comments
Hey, I recently tried to use mold in a peculiar build target that requires __attribute__((retain))
support. So, without it I thought perhaps I could add wildcard support to retain-symbols-file, but I see that get_symbol instantiates new symbols, so I am a bit unsure how to proceed.
What is the best way forward in order to support retaining a bunch of similarly named symbols? Is wildcard matching with retain-symbols-file realistic?
I see that there's a retain mechanism in the codebase for symbols as well, but looking at my RISC-V object files, I'm not sure how the retain attribute is actually stored. (Later on I am realizing it's a section attribute)
8828: 0000000000000000 712 FUNC GLOBAL DEFAULT 190 stdBuildRope
This function has used
and retain
.
I tried looking into it and it seems that SHF_GNU_RETAIN is a section-attribute, that prevents linker GC, however it does seem to also prevent retain-symbols from discarding the symbols with GNU ld too. I guess I am relying on that.
The attribute seems to have the value 0x200000
.
__attribute__((retain))
marks a section so that the section will not be garbage-collected by the linker's --gc-sections
. On the other hand, --retain-symbols-file
makes the linker to keep the specified symbols in the symbol table. They serve different purposes. Could you explain a little bit more about what you are trying to achieve?
I have a bunch of symbols in my executable that are attribute used, retain
in order to not prune them when stripping. This at least works with ld, but I'm not sure exactly why. So, I am just trying to come up with other solutions to retain these symbols automatically. There are quite a few of them, and only an automated solution will work.
My CMake build uses gc-sections, stripping and retain-symbols-file + a few --undefined= for some assembly functions that wouldn't stick otherwise.
I see that --undefined=symbol
does work, at least, and I do have a symbol file. It's just not scalable to write down all the symbols one by one. One thing I noticed, unrelated to this issue, is that I had to change -Wl,-u,symbol
to -Wl,--undefined=symbol
.
What confuses me is that retain is a symbol attribute, yet it's supposed to be applied to a section?
in order to not prune them when stripping
By stripping, do you mean the strip
command or the --gc-sections
linker option?
__attribute__((retain))
does not work for symbols, it marks a section referred by the specified symbols to be kept during --gc-sections
.
I took a look at LLVM lld source code and indeed the behavior implemented to lld seems different from what I did to mold. So mold's --retain-symbols-file
may be misbehaving. Let me take a look further and get back to you
Actually it looks like lld's behavior is incompatible with GNU ld, so I filed it as llvm/llvm-project#91055.
That's not directly related to your request, I guess, though.
Actually it looks like lld's behavior is incompatible with GNU ld, so I filed it as llvm/llvm-project#91055.
That's not directly related to your request, I guess, though.
I think this might be it, and that you hit the nail on the head. I am building static executables that I am using as a low-latency scripting backend for a game server and client. All in all a gargantuan task of creating the emulator, to custom run-times and build systems all the way down to keeping assembly and extern symbols. In your issue you describe ld retaining by storing the symbols in .symtab, and that would indeed be the way that I am looking for those symbols that I want to retain.
So, just to reiterate: Using ld, when I mark a symbol as used, retain
, it will still appear in .symtab despite -Wl,-x,-S
and even with --gc-sections
and --retain-symbols-file
.
I don't exactly know the reasoning behind it, but perhaps it's only for static executables? Either way, it solves my problem of being able to make public functions directly in the code, that cannot be stripped.
EDIT: I will test without retain, and see what happens.
I tested it, and it is indeed __attribute__((retain))
alone that somehow keeps the symbols from getting stripped. It seems that used
is for preventing the compiler from optimizing out the function.
I'm not sure if I understand your problem correctly. I can do the followings.
- Keep sections marked with
__attribute__((retain))
even with--gc-sections
(that's what mold is already doing), and - keep symbols in the sections marked with
__attribute__((retain))
in .symtab even with--strip
(this is new)
Does this what you want?
I think so - but I can try it out. Where would I make this change in the sources to try it? If it matches ld's behavior, it's likely it solves my problem.
There are always subtle differences among different linkers, so "just match ld's behavior" isn't something feasible, just like "just creating a browser that match chrome's behavior" isn't feasible. I'd like to know what exactly you want mold to behave to match GNU ld's behavior.
Minimal example:
long testPruned(void)
{
return 42;
}
__attribute__((retain))
long testRetained(void)
{
return 42;
}
void _start() {}
Then compile and link:
gcc-12 -static -O2 -nostdlib -ffunction-sections -fdata-sections -c test.c -o test.o
gcc-12 -static -O2 -nostdlib -Wl,-gc-sections test.o -o test.elf
Only the retained function remains:
$ nm test.elf
0000000000404000 R __bss_start
0000000000404000 R _edata
0000000000404000 R _end
0000000000401010 T _start
0000000000401000 T testRetained
I also tested this with GCC-13 and GCC-14 (with the matching binutils)
I'm not sure if it can be simplified even further, but this at least shows that the retained attribute must do something to either move a symbol into a separate retained section, or give the symbol a flag. I'm guessing the former?
I was wrong about the retained symbol file. --retain-symbols-file
overrides -S and -x and will only leave the symbols listed in the file, except relocations.
It says it will leave undefined symbols, but that is not the case.
We do support the RETAIN section flag. Please try your test case with the mold linker. So far, this bug doesn't seem to be actionable.
Yep, it was the retain-symbols-file that got overridden on GNU ld, but not mold. I don't really need it so as far as I'm concerned it's all good, and I can use mold now! Thanks, and sorry for the inconvenience. Somehow I thought retain would also apply to the symbols file.