Kaiser-Yang/matrix-calculation-accelerator

Implementations of all multi-thread calculations

Closed this issue · 0 comments

This issue is for all the Implementations of multi-thread calculations.

Requirements

  • Try your best to make all the threads's calculation quantities are similar
  • The result must be same with single thread calculation's
  • Implementations should be efficient
  • Unit test should show the time of multi-thread version and single thread version
  • The calculation quantity of a sub-thread is no less than _limit (defined in src/mca.cpp), and you can use limit() to get the value
  • The users should feel the function is same with single thread version
  • You should try your best to avoid any synchronizations

A Simple Example

Let me show you a simple example.

Function fill is to fill elements of a matrix with a new value. For example, a.fill(3) can make all the elements of a be 3. Let's consider how to implement the multi-thread version.

First you should find the single thread veresion of fill, luckily, std has provided one: std::fill which takes three parameters: the first element iterator (inclusive), the last element iterator (exclusive), and the value you want to fill. So I can use std::fill(a.dataPtr(), a.dataPtr() + a.size(), 3) to set all the elements of a be 3 with single thread.

Second, you should make sure every thread participating calculation has similar calculation quantity. This is how I calculate:

if (size() <= pos) { return; }
// single mode
if (threadNum() == 0 || limit() > size() - pos) {
    std::fill(dataPtr() + pos, dataPtr() + size(), value);
    return;
}

// threadCalculation and taskNum
// res.first is threadCalculation
// res.seconde is taskNum
auto res = threadCalculationTaskNum(size());
}

As you can see, I first check thread pool's size and limit() so that sometimes calculate with single thread directly. Then I use threadCalculationTaskNum(size()) to calculate the calculation quantity of every thread, and how many tasks we should aplly.

Once get the calculation quantity rightly, the next is just to add tasks to the thread pool:

 // the return value of every task, use this to make sure every task is finished
std::vector<std::future<void>> returnValue(res.second - 1);
// assign task for every sub-thread
for (size_t i = 0; i < res.second - 1; i++) {
    returnValue[i] = threadPool().addTask(
        [this, start = i * res.first + pos, end = (i + 1) * res.first + pos, &value]() {
            std::fill(dataPtr() + start, dataPtr() + end, value);
        });
}
// let main thread calculate too
std::fill(dataPtr() + (res.second - 1) * res.first, dataPtr() + size(), value);

The thread pool is easy to use, and what you need is to know how to write lambda functions in C++, there is a official document of lambda functions in C++: lambda (don't forget we use C++17 to build our project). Because every thread is operating on the different parts of the matrix, so there is no need to synchronize.

At last, we should make sure all the threads finished before we return. The threadPool().addTask(...) will return an object of std::future (I have stored them in a std::vector), you just need call the object's get(), the main thread will wait for the threads:

// make sure all the sub threads are finished
for (auto &item : returnValue) { item.get(); }

After this, you are supposed to write a unit test, for this function, here is what I have written:

using namespace std::chrono;
class TestMatrixMultiThread : public testing::Test {
public:
    std::default_random_engine generator;
    Shape shape{9000, 9000};
    Matrix<int>::ElementType value = generator();
    Matrix<int> singleThread = Matrix<>(shape, value);
    Matrix<int> multiThread  = Matrix<>(shape, value);


protected:
    void SetUp() override { generator.seed(time(nullptr)); }

    void TearDown() override {}
};
TEST_F(TestMatrixMultiThread, fill) {
    // first use fill to get the single mode time
    auto startTime = high_resolution_clock::now();
    std::fill(singleThread.dataPtr(), singleThread.dataPtr() + singleThread.size(), value);
    auto endTime       = high_resolution_clock::now();
    auto executionTime = duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
    // record time in gtest
    testing::Test::RecordProperty("SingleTime", executionTime);

    init();

    // the expected time in multi thread
    testing::Test::RecordProperty("BaseTime", executionTime / threadNum());

    // get the multi-thread mode time
    startTime = high_resolution_clock::now();
    multiThread.fill(value);
    endTime       = high_resolution_clock::now();
    executionTime = duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
    // record multi-thread time in gtest
    testing::Test::RecordProperty("MultiTime", executionTime);

    // TODO this should be updated with Matrix::operator== with multi-thread
    // make sure they are equal
    // ASSERT_TRUE(equalSingleThread(singleThread, multiThread, 0, 0, singleThread.getShape()));
}

As you can see, I add a TODO in this, and comment the ASSERT_TRUE, because compare the large matrix with single thread is time consuming ( I just run this once to make sure the test pass), when the comparison with multi-thread finished, I should update this with the multi thread version.

I use std::chrono::high_resolution_clock::now() and testing::Test::RecordProperty() to calculate and store the execution time. You can record time as I do. And when you run ctest --output-on-failure (I have taught you how to run the unit tests), all the test's properties will be stored in a xml file in build/test, and they look like this:

image

Their name is combined with your test's two parameter by a dot, so you can easily find your test's output. In this example, I can use cat TestMatrixMultiThread.fill.xml to see my test unit's output:

image

You can see the time in single thread version and multi thread version is similar, which is weird. But don't worry, may be this matrix is too small, and the std::fill can not take full advantage of cpu cache. Maybe in the calculation will be better. I tried a larger matrix in a server, the multi thread quicker:

image

See the example in the pull request: src, test (these two are out of date, see the files of develop branch directly).

Tasks List

@Kaiser-Yang

// TODO
template <class T1, class T2>
Matrix<std::common_type_t<T1, T2>> operator*(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T1, class T2>
void operator*=(Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class ELEMENT_TYPE>
template <class O>
void Matrix<ELEMENT_TYPE>::pow(size_t exponent, Matrix<O> &output) {}
template <class ELEMENT_TYPE>
void Matrix<ELEMENT_TYPE>::pow(size_t exponent) {}

@11231f

// TODO
template <class ELEMENT_TYPE>
template <class O>
void Matrix<ELEMENT_TYPE>::transpose(Matrix<O> &output) {}
template <class ELEMENT_TYPE>
void Matrix<ELEMENT_TYPE>::transpose() {}

// TODO
template <class ELEMENT_TYPE>
inline bool Matrix<ELEMENT_TYPE>::symmetric() const {}
template <class ELEMENT_TYPE>
inline bool Matrix<ELEMENT_TYPE>::antisymmetric() const {}

@Kaiser-Yang

// TODO
template <class T1, class T2>
bool operator==(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T1, class T2>
bool operator!=(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T1, class T2>
bool operator<(const Matrix<T1> &a, const Matrix<T2> &b) {}

@Kaiser-Yang

// TODO
template <class T1, class T2>
bool operator<=(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class Number, class T>
Matrix<std::common_type_t<T, Number>> operator-(const Number &number, const Matrix<T> &a) {}

// TODO
template <class T, class Number>
Matrix<std::common_type_t<T, Number>> operator/(const Matrix<T> &a, const Number &number) {}

// TODO
template <class Number, class T>
void operator-=(const Number &number, Matrix<T> &a) {}

// TODO
template <class T, class Number>
void operator/=(Matrix<T> &a, const Number &number) {}

@Kaiser-Yang

// TODO
template <class T1, class T2>
bool operator>(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T, class Number>
Matrix<std::common_type_t<T, Number>> operator*(const Matrix<T> &a,
                                                const Number &number) {}

// TODO
template <class Number, class T>
Matrix<std::common_type_t<T, Number>> operator*(const Number &number, const Matrix<T> &a) {}

// TODO
template <class T, class Number>
void operator*=(Matrix<T> &a, const Number &number) {}

// TODO
template <class Number, class T>
void operator*=(const Number &number, Matrix<T> &a) {}

@Quentin9922

// TODO
template <class T1, class T2>
bool operator>=(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T, class Number>
Matrix<std::common_type_t<T, Number>> operator-(const Matrix<T> &a, const Number &number) {}

// TODO
template <class Number, class T>
Matrix<std::common_type_t<T, Number>> operator/(const Number &number, const Matrix<T> &a) {}

// TODO
template <class T, class Number>
void operator-=(Matrix<T> &a, const Number &number) {}

// TODO
template <class Number, class T>
void operator/=(const Number &number, Matrix<T> &a) {}

@luckygalaxy666

// TODO
template <class T, class Number>
Matrix<std::common_type_t<T, Number>> operator+(const Matrix<T> &a, const Number &number) {}

// TODO
template <class Number, class T>
Matrix<std::common_type_t<T, Number>> operator+(const Number &number, const Matrix<T> &a) {}

// TODO
template <class Number, class T>
void operator+=(const Number &number, Matrix<T> &a) {}

// TODO
template <class T, class Number>
void operator+=(Matrix<T> &a, const Number &number) {}

// TODO
template <class ELEMENT_TYPE>
template <class Number, class O>
void Matrix<ELEMENT_TYPE>::powNumber(Number &&number, const Matrix<O> &output) {}
template <class ELEMENT_TYPE>
template <class Number>
void Matrix<ELEMENT_TYPE>::powNumber(const Number &number) {}

@Kaiser-Yang

// TODO
template <class T1, class T2>
Matrix<std::common_type_t<T1, T2>> operator-(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T1, class T2>
void operator-=(Matrix<T1> &a, const Matrix<T2> &b) {}


// TODO
template <class ELEMENT_TYPE>
template <class Number, class O>
void Matrix<ELEMENT_TYPE>::numberPow(Number &&number, const Matrix<O> &output) {}
template <class ELEMENT_TYPE>
template <class Number>
void Matrix<ELEMENT_TYPE>::numberPow(const Number &number) {}

// TODO
template <class T1, class T2>
Matrix<std::common_type_t<T1, T2>> operator+(const Matrix<T1> &a, const Matrix<T2> &b) {}

// TODO
template <class T1, class T2>
void operator+=(Matrix<T1> &a, const Matrix<T2> &b) {}

@Kaiser-Yang : all the rest TODOs.

NOTE1: When you finish one, do not forget to remove the TODO
NOTE2: When you create a pull request, you should find another reviewer (NOT @Kaiser-Yang ), let him or her give you a review, then you shoule find @Kaiser-Yang to review, @virtuosoo can let me review directly

What Should I Do When I was chosen as a reviewer

First, you should figure out what the pull request is for. To achieve this, you usually just need see the title and the description, when you find it is difficult to figure out what the pull request is for, you can leave a comment to let the creator retitle and add more description.

Second, you should check if the pull request's destination branch is right (all the pull requests should merge into develop rather than any other one). If the pull request is not merging into develop, you can close it and let the creator create a new one merging into develop.

Third, you should check if the commit messages meet the rules we've required (If you don't know what the rules, see the README.md). If there is any commit message which does not meet the rules, you can let the creator modify their commit messages (After modifying a commit message by git commit --amend or git rebase -i, the creator usually need use git push -f to upload their updates).

At last, you are supposed to check the codes. If you find the codes are not resonable, you can leave your reviews at curtain lines to let the creator modify.

When all your requests are resolved by the creator, you can leave a approve, and then let @Kaiser-Yang check. If the first three steps are not well-doen, I'll blame you. But if the last step is not well-done by you, I'll not blame you.

Update

I've done the tasks, you can see how I achieve those: here.