This package attempts to efficiently find memory dependencies dynamically based on the pairwise method and this paper: SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
There are different versions of the parallelism detection that are implemented in different branches of this repository:
- hash_bit_compress: An efficient method based on storing access information in bit arrays. See my thesis for more details
- naive_method_real: An inefficient method for comparison to the efficient methods
The code is built with CMake.
A simple CMake build puts the binaries in the bin
directory. For example, using make on linux:
cmake . -DCMAKE_BUILD_TYPE=Release
make
In test-scripts
, there are two important files.
conflicts.py
, which runs through loops and accesses data, and outputs that in a file intest-data/conflict/
test_runner.py
, which runs the files intest-data/conflict/
throught the Data-Dependence algorithm, and puts the output in files in thetest-data/conflict-out/
folder
So to run this test completely, do
cd test-scripts
python3 conflicts.py
python3 test_runner.py
and then check
git status
to make sure that the output files that were generated by test_runner.py were the same as they were in git.
The unit_test
executable is a simple custom unit test framework that
gives error messages about unit tests that do not pass.
Other tests, not really appropriate for unit testing are the stride_detector_test
and the merge_test
.
These take input files that are a list of memory accesses, and spit out how they processed them. The files in single_pc_test_data
are for this. Also, the files test-data/OldTestInputs
are somewhat appropriate for input.
Since I had much less time than the original SD3 developers, I changed a number of the basic data structures to make implementation easier, while keeping asymptotic efficiency.
The major changes are:
- Instead of using a interval tree, I sort lists of the items that would have been put in the tree, and use an linear time algorithm to compare the lists. Then I ensured that this routine would not run too many times (measured in terms of the number of times this routine would look at a particular memory access).
- Another change is the data structure for merging strides. They use an interval tree and caching, which works most of the time, while I use a hash table and a binary search tree that merges all mergable lists efficiently. See
src/ConflictData.h
for more details. - I also did not implement a number of other optimizations mentioned in the paper, like Dynamic Site ID.