/SD3-implementation

Implementation of a efficient dynamic parallelism detector for my senior thesis

Primary LanguageC++

Dynamic Data dependence analysis

This package attempts to efficiently find memory dependencies dynamically based on the pairwise method and this paper: SD3: A Scalable Approach to Dynamic Data-Dependence Profiling

Different versions

There are different versions of the parallelism detection that are implemented in different branches of this repository:

  • hash_bit_compress: An efficient method based on storing access information in bit arrays. See my thesis for more details
  • naive_method_real: An inefficient method for comparison to the efficient methods

Building the code

The code is built with CMake.

A simple CMake build puts the binaries in the bin directory. For example, using make on linux:

cmake . -DCMAKE_BUILD_TYPE=Release
make

Running tests

Main regression/end-to-end test

In test-scripts, there are two important files.

  • conflicts.py, which runs through loops and accesses data, and outputs that in a file in test-data/conflict/
  • test_runner.py, which runs the files in test-data/conflict/ throught the Data-Dependence algorithm, and puts the output in files in the test-data/conflict-out/ folder

So to run this test completely, do

cd test-scripts
python3 conflicts.py
python3 test_runner.py

and then check

git status

to make sure that the output files that were generated by test_runner.py were the same as they were in git.

Unit tests

The unit_test executable is a simple custom unit test framework that gives error messages about unit tests that do not pass.

Other tests

Other tests, not really appropriate for unit testing are the stride_detector_test and the merge_test.

These take input files that are a list of memory accesses, and spit out how they processed them. The files in single_pc_test_data are for this. Also, the files test-data/OldTestInputs are somewhat appropriate for input.

Changes from SD3

Since I had much less time than the original SD3 developers, I changed a number of the basic data structures to make implementation easier, while keeping asymptotic efficiency.

The major changes are:

  • Instead of using a interval tree, I sort lists of the items that would have been put in the tree, and use an linear time algorithm to compare the lists. Then I ensured that this routine would not run too many times (measured in terms of the number of times this routine would look at a particular memory access).
  • Another change is the data structure for merging strides. They use an interval tree and caching, which works most of the time, while I use a hash table and a binary search tree that merges all mergable lists efficiently. See src/ConflictData.h for more details.
  • I also did not implement a number of other optimizations mentioned in the paper, like Dynamic Site ID.