- Dependencies
- Building it
- What is it?
- GCC Compatibility
- Padding
- Environment Setup
- Testing
- DWARF Support
- vxWorks Support
libdwarf-dev
libelf-dev
libsqlite3-dev
C++14
Catch2
g++>=5.4.0
gcovr
libarchive-zip-perl
(Needed for unit test verification of crc32)
- Clone the repo
git clone https://github.com/WindhoverLabs/juicer.git
cd juicer
- Update the git submodules:
git submodule update --init
- Right now the working branch is
develop
, so switch to it
git checkout develop
- Our build system has a few build recipes. If all you want is to get jucier up and running,
make
This will build the executable for you, which you'll find on build/juicer
.
If you would like to run unit tests, you can do that too:
make run-tests
NOTE: Make sure you have all the dependencies mentioned above. If you are missing any of those dependencies, juicer will not build.
juicer extracts structs, arrays, enumerations and intrinsic types(support for everything else is planned for the future, but it is not a priority at the moment) from executable elf files and stores them in a SQLite database.
Imagine we wrote some elf_file.cpp that looks like this.
#include "stdint.h"
typedef struct
{
int32_t width = 101;
uint16_t stuff;
uint16_t padding1;
int32_t length;
uint16_t more_stuff;
uint16_t padding2;
float floating_stuff;
float matrix3D[2][4][4];
float matrix1D[2];
}Square;
Square sq = {};
juicer
uses DWARF debug information to extract all the information. Because of this, you must pass the -g
flag to gcc
when compiling your source code:
g++ -std=c++14 elf_file.cpp -g -c -o elf_file
Assuming you've built juicer successfully, you can give this binary file to juicer:
./build/juicer --input path_to_file --mode SQLITE --output build/new_db.sqlite -v4
This tells juicer to squeeze and extract as much as possible out of the binary at path_to_file and write all of that data to the sqlite
database at build/new_db.sqlite. v4
is for verbosity level 4, which is the highest level and will output every message from the log.
After juicer is done, you will find a database populated with data about our binary file at build/new_db.sqlite
. The database should have the following schemas:
id* | name | checksum | date | little_endian |
---|---|---|---|---|
INTEGER | TEXT | INTEGER | DATETIME | BOOLEAN |
symbol* | value* | name |
---|---|---|
INTEGER | INTEGER | TEXT |
field* | bit_size | bit_offset |
---|---|---|
INTEGER | INTEGER | INTEGER |
id* | name | symbol+ | byte_offset | type+ | little_endian | bit_size | bit_offset |
---|---|---|---|---|---|---|---|
INTEGER | TEXT | INTEGER | INTEGER | INTEGER | BOOLEAN | INTEGER | INTEGER |
id* | field_id+ | dim_order | upper_bound |
---|---|---|---|
INTEGER | INTEGER | TEXT | INTEGER |
id* | elf+ | name | byte_size |
---|---|---|---|
INTEGER | INTEGER | TEXT | INTEGER |
In our specific example, the symbols and fields tables are the ones we are interested in.
NOTE: Notice the bit_size
and bit_offset
columns in the fields table; these values are used for struct members that are bit-packed. When the fields are not bit-packed, bit_size and bit_offset are set to 0.
As you can see in the image above, our Square
struct that we defined in our source file is in row 15!
You might ask where are its members...that's what the fields table is for.
As you can see we have a few fields that match our Square struct's id, which is 15. Those fields belong to our struct Square
. Also note the type column; this tells us the particular type a field is.
What about our matrix arrays such as matrix3D
and matrix1D
? That's what the dimension_lists table is for.
Notice how the three records in dimension_lists have a field_id
of 8
. If we look at our fields table we notice that
matrix3D
has an id of 8
as well. The dimension_lists tells us that field with id 8
is 3 dimensional array; the first
dimension has an upper bound of 1(inclusive; size 2); the second one(which has dim_order of 1) is 3; the third one has
an upper bound of 3. These are the dimensions of matrix3D
. This design is modeled after the DWARF4 and XTCE standards. Hopefully this schema is clear enough.
This is how juicer stores data in the database.
NOTE: Beware that it is absolutely fine to run juicer multiple times on different binary files but on the same database. In fact juicer has been designed with this mind so that users can run juicer multiple times against any code base, no matter how large in size.
Sincejuicer
is reading ELF files, the compiler one uses or the specific linux version can affect the behavior of the libelf libraries.
Because of this we have tested juicer
on the specified platforms in the table below.
Ubuntu Version | GCC Version(s) |
---|---|
Ubuntu 16.04.7 LTS |
gcc 5.4.0 , gcc 6.5.0 |
Ubuntu 18.04.5 LTS |
gcc 7.5.0 , gcc 8.4.0 |
Ubuntu 20.04.1 LTS |
gcc 7.5.0 , gcc 8.4.0 , gcc 9.3.0 |
Different compilers and sometimes programmers insert padding into C Structures. Padding in the database is captured by juicer
as well. Padding fields will have a name in the "_spare[N]" fashion in the database. N is for distinguishing different fields. For exampe a struct that has three fields of padding will have _spare0
, _spare1
and _spare2
. Padding that is inserted at the end of the struct has a field with the name of _padding_end
. Hopefully this naming scheme makes sense.
When juicer
finds padding, a new type is created for the number of bytes of padding that are found. For instance, if there is 3 bytes of padding then a type _padding24
will be created. The 24
is the size of the padding in bits. Please note that if more padding is found elsewhere and the number of bytes is 3, then the _padding24
type will be used for that field to avoid over-populating the database with unnecessary data.
Most of this project is written in C++14, with some parts in C because of libraries like sqlite3
and libdwarf
.
- Eclipse CDT 9.7.0.201903092251
- C++14
- Ubuntu 16.04.6 LTS
We currently use Eclipse to maintain this project and we are trying to not make that a painful dependency that
developers have to deal with. This is why the Eclipse project is under juicer/juicer
. We know it's not the best
solution, but we'll try to make this a little more modular in the future to be usable in other IDEs and environments.
For now, if you want to get started on Juicer, your best bet is using Eclipse. Hopefully we'll find a more modular non-Eclipse way of doing this.
By default this version of Eclipse uses gdb 7.11.1
, but this version does not have support for inspecting smart pointers. You need to setup lldb
, llvm's debugger.
-
sudo apt-get install lldb
-
Now on Elcipse go to Help->Install New Software... and Install LLDB Debugger
-
Then right-click on your project and go to Juicer->Debug As->Debug Configurations
-
Click on the select other... option at the bottom of the window
-
Set
LLDB
as your default debugger
And you are all set!
Sometimes it is often useful to be able to step through code inside of libdwarf, which juicer heavily depends on.
This can be done by installing libdwarf1-dbgsym
with the following commands:
sudo apt install ubuntu-dbgsym-keyring
echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | \
sudo tee -a /etc/apt/sources.list.d/ddebs.list
sudo apt-get install libdwarf1-dbgsym
We currently use the Catch2 framework for our Unit testing. It is integrated into the repo as a submodule. You can find
all of our testing under the Testing
folder. Just like Juicer, it is an Eclipse project. So, if you already use Eclipse,
then just import the project and you can test away!
Catch2
allows to easily write very readable tests that are very easy to maintain and write. Below is an example of this
that tests the correctness of an ElfFile's name on Testing/TestElfFile.cpp
:
#include "catch.hpp"
#include "ElfFile.h"
TEST_CASE("Test name correctness", "[ElfFile]")
{
ElfFile elffy{};
std::string elffyName{"elffy.o"};
elffy.setName(elffyName);
REQUIRE(elffy.getName() == elffyName);
}
This test called "Test name correctness" will construct a very simple ElfFile object and check that when we set a name with setName
we get the same exact name back when we call getName
. The REQUIRE
macro is the actual assertion that verifies this. If the test fails(meaning the REQUIRE
macro is false), then our test and any other tests that follow stop executing. Catch2 has more interesting features like CHECK
and syntax like WHEN
for behavior-driven development that
you can read all about in the link above.
Note that we don't define a main
function here so we define one very easily in Testing/main.cpp
:
/*This tells Catch to provide a main() - only do this in one cpp file*/
#define CATCH_CONFIG_MAIN
/**This disables coloring output so that Eclipse CDT(Version: 9.7.0.201903092251)
*will be able to render it. If you really like colored output, you'll have to use
*something else other than Eclipse's console(such as GNOME shell) to run the tests
*and comment out the CATCH_CONFIG_COLOUR_NONE macro.
*/
#define CATCH_CONFIG_COLOUR_NONE
#include "catch.hpp"
Yes, that's it! Catch2 will read the CATCH_CONFIG_MAIN
and generate a main
function for you. The CATCH_CONFIG_COLOUR_NONE
is not necessary to run Catch2, but if you run into problems where the output will not render properly because it is colored(like in Eclipse), then you might find this macro useful.
Now all you have to do is build your project on Eclipse(or from the terminal) and then run all of your tests.
You can run your tests like this:
make run-tests
To run a specific test:
cd build
./juicer-ut "[ElfFile]"
Notice the tag "[ElfFile]" which was defined for the test case above.
make coverage
This will run all unit tests on juicer and generate a test coverage report for you. After make
is done, the test coverage report can be found on build/coverage/index.html
.
At the moment juicer
follows the DWARF4 specification, which is the standard in all versions of gcc at the moment. If this changes, then this document will be updated accordingly.
As juicer evolves, dwarf support will grow and evolve as well. At the moment, we don't adhere to a particular DWARF version as we add support to the things that we need for our code base, which is airliner. In other words, we mostly support C
code, or C++
code without any cutting edge/modern features. For example, modern features such as templates
or namespaces
are not supported. If juicer finds these things in your elf files, it will simply ignore them. To have a more concrete idea of what we do support in the DWARF, take a look at the table below which records all DWARF tags we support.
Name | Description |
---|---|
DW_TAG_base_type | This is the tag that represents intrinsic types such as int and char . |
DW_TAG_typedef | This is the tag that represents anything that is typdef'd in code such as typedef struct{...} . At the moment, types such as typedef int16 my_int do not work. We will investigate this issue in the future, however, it is not a priority at the moment. |
DW_TAG_structure_type | This is the tag that represents structs such as struct Square{ int width; int length; }; |
DW_TAG_array_type | This is the tag that represents statically allocated arrays such as int flat_array[] = {1,2,3,4,5,6}; . Noe that this does not include dynamic arrays such as those allocated by malloc or new calls. |
DW_TAG_pointer_type | This is the tag that represents pointers in code such as int* ptr = nullptr |
DW_TAG_enumeration_type | This is the tag that represents enumerations such as enum Color{RED,BLUE,YELLOW}; |
DW_TAG_const_type | This is the tag that represents C/C++ const qualified type such as sizetype , which is used by containers(like std::vector) in the STL C++ library. |
For more details on the DWARF debugging format, go on here.
DWARF version 4 has this to say about void pointers:
The interpretation of this debugging information entry is intentionally left flexible to allow it to be interpreted appropriately in different languages. For example, in C and C++ the language implementation can provide an unspecified type entry with the name “void” which can be referenced by the type attribute of pointer types and typedef declarations for 'void' (see Sections 0 and 5.3, respectively) -- Section 5.2 of DWARF4
juicer behaves accordingly. If a pointer does not have a type(meaning it does not have a DW_AT_type attribute), then it is assumed that the pointer in question is of the void*
type.
At the moment vxWorks support is a work in progress. Support is currently not tested, so at the moment it is on its own [branch] (https://github.com/WindhoverLabs/juicer/tree/vxWorks).
catchsegv ./juicer-ut "[main_test#3]"
addr2line -e ./juicer-ut 0x19646c
Documentation updated on September 29, 2021