eliaskosunen/scnlib

Quadratic Complexity Issue with scnlib's scan Function for Long Texts

ldeng-ustc opened this issue · 5 comments

I recently encountered a potential performance issue while using the scnlib library. When processing long strings with the scn::scan function, I noticed that the processing time seems to be proportional to the square of the length of the input range.

I wrote a test code that generates some random numbers and processes them with the scn::scan function. Then, I recorded the time required to process strings of different lengths and plotted the results. The plot shows that as the length of the string increases, the processing time exhibits a quadratic growth trend.

This issue is particularly evident when processing long texts, even if only one number is processed, the efficiency is low. I believe this might be an area for improvement.

Some other issues such as #45 and #68 have also mentioned similar performance problems, but they did not explicitly point out that this is due to the Ω(n^2) time complexity.

Test Result:

image

My test code:

#include <random>
#include <limits>
#include <chrono>
#include "fmt/format.h"
#include "scn/scn.h"

using namespace std;

class RAII_Timer {
public:
    RAII_Timer(const char* name): name(name), start(std::chrono::high_resolution_clock::now()) {}

    ~RAII_Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        std::chrono::duration<double, std::milli> elapsed = end - start;
        fmt::print("{}: {}ms\n", name, elapsed.count());
    }

private:
    const char* name;
    std::chrono::high_resolution_clock::time_point start;
};

char* generate_random_numbers(size_t bytes) {
    RAII_Timer timer("generate bytes");
    char *buffer = new char[bytes + 100];
    memset(buffer, -1, bytes + 100);
    std::mt19937 gen(0);
    const int64_t i64min = std::numeric_limits<int64_t>::min();
    const int64_t i64max = std::numeric_limits<int64_t>::max();
    std::uniform_int_distribution<int64_t> dis(i64min, i64max);
    // to string and write to buffer
    char *p = buffer;
    while (p < buffer + bytes) {
        auto num = dis(gen);
        p = fmt::format_to(p, "{} ", num);
    }
    *p = '\0';
    return buffer;
}

void test_scn_string_view(const char* buffer, size_t len) {
    RAII_Timer timer("scn::scan for string_view");
    scn::string_view sv(buffer, len);
    int64_t num;
    size_t cnt = 0;
    auto result = scn::make_result(sv);
    while (result = scn::scan(result.range(), "{} ", num)) {
        cnt ++;
    }
    fmt::print("bytes: {}, cnt: {}\n", len, cnt);
}

int main() {
    size_t bytes = 64 * 1024 * 1024;
    const char* buffer = generate_random_numbers(bytes);
    for (size_t i = 0; i < 64 * 1024; i++) {
        test_scn_string_view(buffer, i * 1024);
        fflush(stdout);
    }
    return 0;
}

Thanks for the benchmark! At least with the current master (88db004), I'm unable to replicate your results:

image

The code is otherwise the same, except I ran it until 256 MiB of input, with 4 MiB increments. The slight bumps are probably from CPU scheduling (I didn't bother with closing everything I had running, and didn't change frequency scaling settings). The trend is still very clearly linear. It's built with gcc 12, using -std=c++20 -flto -O3 -DNDEBUG.

Are you able to verify if you're still having the same results?

Thanks for your response. I retested with gcc 11.4.0 on the latest master (eeac40e), using your compilation options, and got the same results.

Here’s my test process:

Firstly, I cloned the latest repo, and compiled scnlib with:

cmake -S . -B build -DSCN_TESTS=no -DSCN_BENCHMARKS=no
cmake --build build

Then, I compile my test program using your provided compilation options:

g++ -flto -std=c++20 -O3 -DNDEBUG -I../scnlib/include -L../scnlib/build -o scn_test scn_test.cpp -lscn

After running the program for a while, I observed the same results as before. I was unable to run the same scale of data as you did, as in my test, processing 100 KB of data took more than 2s, and 4 MiB of data couldn’t yield a result within a few minutes.

To facilitate reproduction, I also tried a shorter code snippet:

#include "scn/scn.h"
int main(int argc, char** argv) {
    size_t len = std::stoul(argv[1]);
    char* buffer = new char[len];
    for(size_t i = 0; i < len; i ++) {
        buffer[i] = (i % 5 == 4 ? ' ' : '1');
    }
    auto result = scn::make_result(scn::string_view(buffer, len));
    int num;
    while (result = scn::scan(result.range(), "{} ", num)) {}
    return 0;
}

After compiling with the same options, the running time of this program is as follows:

$ time ./scn_mini_test 10000
./scn_mini_test 10000  0.03s user 0.00s system 99% cpu 0.032 total
$ time ./scn_mini_test 50000
./scn_mini_test 50000  0.73s user 0.00s system 99% cpu 0.726 total
$ time ./scn_mini_test 100000
./scn_mini_test 100000  2.83s user 0.00s system 99% cpu 2.825 total
$ time ./scn_mini_test 200000
./scn_mini_test 200000  11.28s user 0.00s system 99% cpu 11.278 total

It took a significant 11 seconds to process a mere 200KB of data.

I hope this information contributes to our understanding of the situation. If there’s any other information that could be helpful or if there are other variables I should consider, please let me know. Thank you once again for your assistance.

I found that if I manually specify -DCMAKE_BUILD_TYPE=Release or -DCMAKE_BUILD_TYPE=RelWithDebInfo when compiling scnlib, everything works fine.

I’m not sure why the default build type for CMake is not Release, and the installation steps in the documentation also do not point out that CMAKE_BUILD_TYPE needs to be specified manually. Is this the expected behavior? Or could there be a problem somewhere in the CMakeLists?

It seems that CMake does not specify the default build type. I found that fmtlib manully set the default value in CMakeLists. Should scnlib do similar thing? It’s a bit confusing when problems arise from this.

Yeah, CMake doesn't specify a default build type, which can be surprising. I guess taking a page from fmt's book wouldn't hurt here: if we're the master project, defaulting to Release might be useful, and prevent cases like this.

Thank you for your valuable research, nevertheless! Closing.