SRI-CSL/OCCAM

undefined references to function names for libc specialization

huzaifahnadeem opened this issue · 2 comments

I was trying to use occam on some programs, the goal was to debloat the programs with static libc library. For this, I pulled a docker container as instructed on this repo. For the library, I used musllvm from SRI. I tried it on several programs and it worked to different levels of success on different programs. I will describe the steps I took. I am also attaching a tar file containing the demonstration of what I did: demonstration.tar.gz

gzip 1.2.4:

I first tried to debloat gzip. For gzip, the process went smoothly. In the manifest, I included "libc.a.bc" from musllvm for the "modules" key in the manifest along with "crt1.o" and "libc.a" for the "native_libs" key in the manifest. As expected, a debloated binary was generated without issues. The attached tar file demonstration.tar.gz contains a folder gzip which contains the .bc file of gzip along with the aforementioned files and a build script that is there to demonstrate that it worked. The issue came after this for other programs that I wanted to debloat.

bzip2 1.0.8:

This is the same version of bzip as the one in this repo /occam/examples/linux/bzip2. I used this example with the only difference being that in the manifest file, as in gzip, I added "libc.a.bc" in the "modules" key along with "crt1.o" and "libc.a" for the "native_libs" key in the manifest. However, at the very end, the Occam is unable to generate the final binary and the last line of the output of the build.sh script is
cp: cannot stat './slash_specialized/bzip2_slashed': No such file or directory
suggesting that the binary was not generated. I looked into it more and I found out the very last step in the whole process is that occam runs the following command:

clang++ ./slash_specialized/amalgamation.bc -o bzip2_slashed ./libc.a ./crt1.o -static -nostdlib

This command returns an error. If I run this command manually after the build.sh script gives the error, the output for this command is as follows:

./crt1.o: In function `_start_c':
crt1.c:(.text._start_c+0x1a): undefined reference to `__libc_start_main'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

the output suggests that there some issue with the symbol table and rearranging the flags in the command as follows:

clang++ -static -nostdlib ./slash_specialized/amalgamation.bc ./crt1.o ./libc.a -o bzip2_slashed

works and a binary is generated without errors. The rearrangement probably works because the crt1.o file is needed to provide the definition of the entry symbol __start. This is potentially a bug in occam. When occam runs this final compilation command internally, the order of flags can affect the result. To replicate the process, in the attached tar file demonstration.tar.gz is a folder bzip2 after running 'make' and './build.sh' a binary would not be generated, but running "./reArrangedCommand.sh" would succeed in generating the binary for bzip2.

Anyhow, this seemed to fix the issue with bzip2, however, I was not successful even after using this trick. I proceeded to debloat mkdir from coreutils 8.32 but I have not been able to make that work. The details are as follows.

mkdir:

The main issue comes here. When I try to debloat it by making a build.sh script containing the manifest with steps same as gzip mentioned earlier i.e. in the manifest, I included "libc.a.bc" from musllvm for the "modules" key in the manifest along with "crt1.o" and "libc.a" for the "native_libs" key in the manifest. The scripts outputs

cp: cannot stat 'slash/mkdir_spec': No such file or directory

suggesting that the binary was not generated. I then tried to run the final compilation command that occam runs which in this case is:

clang++ ./slash/mkdir.a.i.p.i.h.x.bc ./slash/libc.a.i.p.i.h.x.bc -o mkdir_spec ./crt1.o ./libc.a -static -nostdlib

the compiler gives error as follows:

/tmp/mkdir-bafca3.o: In function `__unnamed_1':
llvm-link:(.text+0x3a): undefined reference to `program_name'
llvm-link:(.text+0x4e): [demonstration.tar.gz](https://github.com/SRI-CSL/OCCAM/files/7427462/demonstration.tar.gz)`__fprintf_chk'
llvm-link:(.text+0x73): undefined reference to `program_name'
llvm-link:(.text+0x82): undefined reference to `__printf_chk'
llvm-link:(.text+0x182): undefined reference to `program_name'
llvm-link:(.text+0x18c): undefined reference to `last_component' 
... 

I also tried the trick of rearranging the flags like i did with bzip2 earlier, i.e. running the command

clang++ -static -nostdlib ./slash/mkdir.a.i.p.i.h.x.bc ./slash/libc.a.i.p.i.h.x.bc ./crt1.o ./libc.a -o mkdir_spec

The error is exactly the same as with not rearranging the command.

I tried looking into these undefined references' function name and at least for some of them e.g. `__fprintf_chk' the function is from the standard gnu libc library but I cannot seem to understand the reason for this error because I'm explicitly trying to use static libc i.e. musllvm and this shouldn't be an issue.

I was hoping someone would be able to help me with this because I had this exact same issue that I had with mkdir with other programs I was trying to debloat: rm, chmod, date, uniq, grep, tar etc.

Hi @huzaifahnadeem ,
Sorry for the long delay.
I'm not an expert in the Clang linker. I looked at other examples and this manifest seems to work (i.e., linker doesn't complain) for me:

{ "main" :  "mkdir.bc"                                                                                                                                                                                    
, "binary"  : "mkdir_spec"                                                                                                                                                                                
, "modules"    : ["libc.a.bc"]                                                                                                                                                                            
, "native_libs" : ["libc.a" ]                                                                                                                                                                             
, "ldflags" : ["-lz","-lselinux"]                                                                                                                                                                         
, "static_args"    : [ ]                                                                                                                                                                                  
, "dynamic_args" : "2"                                                                                                                                                                                    
, "name"    : "mkdir"                                                                                                                                                                                     
}                        

I didn't need the --almagamate option.

Hello. I'm sorry for taking so long to get back to this. I tried the manifest file that you provided and for me it did not work either. Occam does give out statistics that these many functions were removed, however, it is unable to generate a final binary as an output -- same as before as I mentioned in the original post. Moreover, I think your manifest file is missing two required flags: -nostdlib and -static that I think one needs to add if one is linking a library such as what I am trying to do. Further, just to provide more information about my environment, I'm using a docker container by following the instructions on this repo's main page. Inside the container, I'm following the same make file and build file pattern as I provided in the samples above which are on the same pattern as of the examples provided in this repo.

Moreover, about the examples, there are two examples provided for library specialization: one at the following part of the repo https://github.com/SRI-CSL/OCCAM/tree/master/examples/linux/musl_nweb , and the other one at https://github.com/SRI-CSL/OCCAM/tree/master/examples/linux/musl_time . Interestingly, these examples do not seem to work either. If you could try running them yourself then that might help with this issue. Thank you!