aaArmNeonOptimization
Environment
Hisi 3519A-> (2 x A53)
Build And Run
- Remove all localized settings
run command
export LC_ALL=C
- Make
run command
make clean; make -j4
Speed Test
1. Box Filter Algorithm
Image Resolution | Radius | Optimization Algorithm | Loop Count | Time | 核心数 |
---|---|---|---|---|---|
4032x3024 | 3 | Origin Algorithm | 10 | 2313.40ms | 1xA53 |
4032x3024 | 3 | RowAndCol Split | 10 | 784.98ms | 1xA53 |
4032x3024 | 3 | RowAndCol Split && Reduce Repeated Computations | 10 | 2496.55ms | 1xA53 |
4032x3024 | 3 | Reduce Cache Miss | 10 | 302.00ms | 1xA53 |
4032x3024 | 3 | Neon Intrinsics | 10 | 188.37ms | 1xA53 |
4032x3024 | 3 | Neon Assembly | 10 | 187.7ms | 1xA53 |
4032x3024 | 3 | Neon Assembly+pld | 10 | 158.70ms | 1xA53 |
4032x3024 | 3 | Neon Assembly+Diff Predeal | 10 | 181.40ms | 1xA53 |
4032x3024 | 3 | Neon AssemblyV2 | 10 | 145.92ms | 1xA53 |
4032x3024 | 3 | NCNN Origin | 10 | 281.26ms | 1xA53 |
4032x3024 | 3 | NCNN Neon Intrinsics | 10 | 236.82ms | 1xA53 |
4032x3024 | 3 | NCNN Neon Assembly | 10 | 68.54ms | 1xA53 |
4032x3024 | 3 | NCNN Neon AssemblyV2 | 10 | 61.63ms | 1xA53 |