SRI-CSL/gllvm

Unsound bitcode collection when a single file is compiled multiple times

woodruffw opened this issue · 4 comments

GLLVM doesn't currently distinguish between multiple compilations of the same input file in a single build. For example, imagine the following:

all: foo.exe foo.patched.exe

%.exe: $(SRC_DIR)/%.c
	mkdir -p $(dir $@)
	$(CC) $(CFLAGS) -o $@ $^

%.patched.exe: $(SRC_DIR)/%.c
	mkdir -p $(dir $@)
	$(CC) $(CFLAGS) -DPATCHED=1 -o $@ $^

When make all is run, foo.c is compiled twice: once with -DPATCHED=1 and once without.

GLLVM however only produces only one .foo.c.{o,bc} tuple, meaning that the get-bc-collected bitcode for both foo.exe and foo.patched.exe is the same (whichever target make ran last).

I think the solution here is to rewrite GLLVM's object and bitcode file emission to use content-addressed filenames, rather than path-computed filenames.

GLLVM however only produces only one .foo.c.{o,bc} tuple, meaning that the get-bc-collected bitcode for both foo.exe and foo.patched.exe is the same (whichever target make ran last).

To be more precise: GLLVM actually produces two tuples, but clobbers the first (the foo.exe one) with the second (foo.patched.exe).

I wonder if we can please all of the build systems all of the time. If the output was called foo_patched rather than
foo.patched, we'd be OK, right?

I wonder if we can please all of the build systems all of the time. If the output was called foo_patched rather than
foo.patched, we'd be OK, right?

I'm not 100% sure -- I think the confusion happens with the source files, since GLLVM special-cases the "single" compilation mode and will clobber .foo.c.o and .foo.c.bc regardless of the output target.

Yeah you are right.