emscripten-core/emscripten

Linker reports symbol multiply defined for non-header includes

sbalko opened this issue · 6 comments

Hi,

I tried to compile a large C application, comprising multiple object files which I ultimately want to link into a single Javascript file. However, in the final Javascript generation/LLVM bitcode linking stage, I am getting the ill-famous "symbol multiply defined" error for a variable (here: mysymbol) that is defined in a first source file (here: definition.c) and also imported into another source file (here: usage.c). Using the --remove-duplicates switch is to no avail and is apparently deprecated anyways. The two source files are:

#include <stdio.h>

const int mysymbol[3] = {1, 2, 3};

void myfunc() {
    printf("%i\n", mysymbol[0]);
}

and

#include <stdio.h>

#include "definition.c"

int main(int argc, char **argv) {
    printf("%i\n", mysymbol[0]);
}

I then compile the two source files without error:

emcc definition.c -o definition.o.bc
emcc usage.c -o usage.o.bc

However, collectively generating Javascript out of them fails with the "symbol multiply defined" error:

emcc definition.o.bc usage.o.bc -o test.js

On a side note: I agree that including non-header files is a bad practice. In fact, merely delcaring mysymbol in an additional header file definition.h which is imported in both other files solves the linker's colliding symbol problem for good. So I guess the root cause really is the non-header include (which is valid C nonetheless).

Any ideas?

Thanks,
Soeren

Since you're including definition.c into usage.c, why do you also need to link it in? You should be able to just do:

emcc usage.o.bc -o test.js

no?

Yes, agreed. But mind that this example is a an stripped down version of a large existing C code base (ffmpeg for that part) which uses this pattern (in a slightly more complicated manner with several indirections). I'm happy to explain the big picture of how files from different dynamic libraries in ffmpeg are cross-dependent from one another. My intention was to first build those dynamic libs into bitcode files and then link them into a single Javascript file (i.e., not make use of Emscripten's dynamic loading at runtime), as this would require the least surgery to ffmpeg's native build system. Alternatively, I could try simply linking all the object files from all the dynamic libraries. But as said, this is a lot more effort and a hard-to-repeat thing if ffmpeg adds new files over time.

Since the --remove-duplicates switch is defunct, I naturally assumed the linker was able to deal with doubly-defined symbols (which mysymbol is, of course), no?

Cheers,
Soeren

This doesn't work in gcc either, I get

$ g++ def.c main.c 
/tmp/ccOp5Ybz.o: In function `myfunc()':
main.c:(.text+0x0): multiple definition of `myfunc()'
/tmp/ccv0INf8.o:def.c:(.text+0x0): first defined here
collect2: ld returned 1 exit status

Yes, confirmed. Actually, what I really try to accomplish is to link multiple dynamic libraries (i.e., a number of *.so files comprising of LLVM bitcode) into a single large Javascript file. These dynamic libraries apparently each define the same symbol which causes the linker to complain about doubly defined symbols.

The original ffmpeg project does not try to link the object files out of which the dynamic libraries are individually composed. However, from what I understood by reading the emscripten documentation, emscripten tries to be smart about multiply defined symbols and collapses/ignores them somehow. Apparently, I misunderstood the concept of what is going on and cannot expect emscripten to be able to handle this sort of situation. Or can I?

Emscripten does want you to build it all into one file. If the project really assumes separate compilation, this might be hard - need to find shared symbols and manually de-duplicate them, that is, get it to build natively as a single file first. The only alternative to that is to get proper dynamic linking in emscripten, which we do not support yet, it's a hard problem.

juj commented

Let me close this, since I think the original test case is incorrect code. Replacing const int mysymbol[3] = {1, 2, 3}; with static const int mysymbol[3] = {1, 2, 3}; should make it compile.

Alternatively linking via .a instead of via .bc might resolve the issue, though not sure. Perhaps that was the discrepancy all along between native and Emscripten (native used .a linking, but Emscripten .bc)