The question of algorithm improvement
deedy5 opened this issue · 1 comments
deedy5 commented
After fixing some bottlenecks (#183), from the performance test results table I selected those files from the dataset on which the program showed a runtime > 0.1.
performance_comparison_master.xlsx
From these files I made a separate dataset
char-dataset_>0.1s.zip
and ran tests on it.
test file
test_0.1s.py
from glob import glob
from os.path import isdir
from charset_normalizer import detect
def performance_compare(size_coeff):
if not isdir("./char-dataset_>0.1s"):
print("This script require char-dataset_>0.1s to be cloned on package root directory")
exit(1)
for tbt_path in sorted(glob("./char-dataset_>0.1s/**/*.*")):
with open(tbt_path, "rb") as fp:
content = fp.read() * size_coeff
detect(content)
if __name__ == "__main__":
performance_compare(1)
1. pprofile
pprofile --format callgrind --out cachegrind.out.0.1s.test test_0.1s.py
2. vprof heatmap
vprof -c h test_0.1s.py
deedy5 commented
Sorry. The previous vprof test is not relevant, apparently this result was caused by lack of memory.
I reduced the size of dataset and left one file per encoding.
char-dataset_>0.1s.zip
- vprof
vprof -c h test_0.1s.py
vprof (5_3_2022 1_13_21 PM).zip
There are no particularly pronounced bottlenecks.
The question is closed.