Quadratic Complexity Issue with scnlib's scan Function for Long Texts
ldeng-ustc opened this issue · 5 comments
I recently encountered a potential performance issue while using the scnlib library. When processing long strings with the scn::scan function, I noticed that the processing time seems to be proportional to the square of the length of the input range.
I wrote a test code that generates some random numbers and processes them with the scn::scan function. Then, I recorded the time required to process strings of different lengths and plotted the results. The plot shows that as the length of the string increases, the processing time exhibits a quadratic growth trend.
This issue is particularly evident when processing long texts, even if only one number is processed, the efficiency is low. I believe this might be an area for improvement.
Some other issues such as #45 and #68 have also mentioned similar performance problems, but they did not explicitly point out that this is due to the Ω(n^2) time complexity.
Test Result:
My test code:
#include <random>
#include <limits>
#include <chrono>
#include "fmt/format.h"
#include "scn/scn.h"
using namespace std;
class RAII_Timer {
public:
RAII_Timer(const char* name): name(name), start(std::chrono::high_resolution_clock::now()) {}
~RAII_Timer() {
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> elapsed = end - start;
fmt::print("{}: {}ms\n", name, elapsed.count());
}
private:
const char* name;
std::chrono::high_resolution_clock::time_point start;
};
char* generate_random_numbers(size_t bytes) {
RAII_Timer timer("generate bytes");
char *buffer = new char[bytes + 100];
memset(buffer, -1, bytes + 100);
std::mt19937 gen(0);
const int64_t i64min = std::numeric_limits<int64_t>::min();
const int64_t i64max = std::numeric_limits<int64_t>::max();
std::uniform_int_distribution<int64_t> dis(i64min, i64max);
// to string and write to buffer
char *p = buffer;
while (p < buffer + bytes) {
auto num = dis(gen);
p = fmt::format_to(p, "{} ", num);
}
*p = '\0';
return buffer;
}
void test_scn_string_view(const char* buffer, size_t len) {
RAII_Timer timer("scn::scan for string_view");
scn::string_view sv(buffer, len);
int64_t num;
size_t cnt = 0;
auto result = scn::make_result(sv);
while (result = scn::scan(result.range(), "{} ", num)) {
cnt ++;
}
fmt::print("bytes: {}, cnt: {}\n", len, cnt);
}
int main() {
size_t bytes = 64 * 1024 * 1024;
const char* buffer = generate_random_numbers(bytes);
for (size_t i = 0; i < 64 * 1024; i++) {
test_scn_string_view(buffer, i * 1024);
fflush(stdout);
}
return 0;
}
Thanks for the benchmark! At least with the current master (88db004), I'm unable to replicate your results:
The code is otherwise the same, except I ran it until 256 MiB of input, with 4 MiB increments. The slight bumps are probably from CPU scheduling (I didn't bother with closing everything I had running, and didn't change frequency scaling settings). The trend is still very clearly linear. It's built with gcc 12, using -std=c++20 -flto -O3 -DNDEBUG
.
Are you able to verify if you're still having the same results?
Thanks for your response. I retested with gcc 11.4.0 on the latest master (eeac40e), using your compilation options, and got the same results.
Here’s my test process:
Firstly, I cloned the latest repo, and compiled scnlib
with:
cmake -S . -B build -DSCN_TESTS=no -DSCN_BENCHMARKS=no
cmake --build build
Then, I compile my test program using your provided compilation options:
g++ -flto -std=c++20 -O3 -DNDEBUG -I../scnlib/include -L../scnlib/build -o scn_test scn_test.cpp -lscn
After running the program for a while, I observed the same results as before. I was unable to run the same scale of data as you did, as in my test, processing 100 KB of data took more than 2s, and 4 MiB of data couldn’t yield a result within a few minutes.
To facilitate reproduction, I also tried a shorter code snippet:
#include "scn/scn.h"
int main(int argc, char** argv) {
size_t len = std::stoul(argv[1]);
char* buffer = new char[len];
for(size_t i = 0; i < len; i ++) {
buffer[i] = (i % 5 == 4 ? ' ' : '1');
}
auto result = scn::make_result(scn::string_view(buffer, len));
int num;
while (result = scn::scan(result.range(), "{} ", num)) {}
return 0;
}
After compiling with the same options, the running time of this program is as follows:
$ time ./scn_mini_test 10000
./scn_mini_test 10000 0.03s user 0.00s system 99% cpu 0.032 total
$ time ./scn_mini_test 50000
./scn_mini_test 50000 0.73s user 0.00s system 99% cpu 0.726 total
$ time ./scn_mini_test 100000
./scn_mini_test 100000 2.83s user 0.00s system 99% cpu 2.825 total
$ time ./scn_mini_test 200000
./scn_mini_test 200000 11.28s user 0.00s system 99% cpu 11.278 total
It took a significant 11 seconds to process a mere 200KB of data.
I hope this information contributes to our understanding of the situation. If there’s any other information that could be helpful or if there are other variables I should consider, please let me know. Thank you once again for your assistance.
I found that if I manually specify -DCMAKE_BUILD_TYPE=Release
or -DCMAKE_BUILD_TYPE=RelWithDebInfo
when compiling scnlib
, everything works fine.
I’m not sure why the default build type for CMake is not Release
, and the installation steps in the documentation also do not point out that CMAKE_BUILD_TYPE
needs to be specified manually. Is this the expected behavior? Or could there be a problem somewhere in the CMakeLists?
It seems that CMake does not specify the default build type. I found that fmtlib manully set the default value in CMakeLists. Should scnlib do similar thing? It’s a bit confusing when problems arise from this.
Yeah, CMake doesn't specify a default build type, which can be surprising. I guess taking a page from fmt's book wouldn't hurt here: if we're the master project, defaulting to Release might be useful, and prevent cases like this.
Thank you for your valuable research, nevertheless! Closing.