alex-hhh/emacs-sql-indent

Performance issues when indenting large SQL files

alex-hhh opened this issue · 10 comments

Performance issues when indenting large SQL files

HI @PierreTechoueyres , can you please update this issue with the details about the performance problem you mentioned in #24. You can use the elp package to obtain the profiling information as follows:

  • build and load sql-indent.el
  • instrument the package for profiling using M-x elp-instrument-package RET sqlind- RET
  • open the SQL file that creates the problem
  • run the following commands M-x mark-whole-buffer RET M-x indent-region RET
  • display the profiling results using M-x elp-results RET
  • copy the results into this issue (use ``` formatting please :-) )

I can than have a look to see which functions are using most of the time and see if we can make them faster.

Here are a few more tips about the elp package:

  • If you want to run another profile run, you can use elp-reset-all to forget previous profiling information, otherwise, the new times will be added to the old one
  • if you update sql-indent.el, you will need to recompile it and instrument it again. You probably only want to profile functions that are compiled, as the interpreted ones will be slower
  • for more details about the profiler, see the comments in the elp.el file
  • there is another profiler in recent versions of Emacs, but I have not used it yet and have no experience with it.

Here are the results for an 15945 lines package body (yes I know It's silly but I'm not the only one working on it and this code as evolved since so many years):

sqlind-beginning-of-block              151768      14247.478968  0.0938766997
sqlind-syntax-of-line                  16093       13642.838701  0.8477498726
sqlind-indent-line                     15516       13597.269087  0.8763385593
sqlind-beginning-of-statement          859179      12691.246749  0.0147713651
sqlind-begining-of-directive           859179      12324.157988  0.0143441098
sqlind-maybe-then-statement            1571490     10282.733953  0.0065433021
sqlind-maybe-begin-statement           862900      2377.7371397  0.0027555187
sqlind-beginning-of-statement-1        858534      278.78560624  0.0003247228
sqlind-maybe-loop-statement            886321      262.96452787  0.0002966922
sqlind-maybe-if-statement              1571844     66.440841100  4.226...e-05
sqlind-refine-end-syntax               2201        53.762032220  0.0244261845
sqlind-backward-syntactic-ws           531748      50.605912203  9.516...e-05
sqlind-refine-syntax                   16093       48.642975915  0.0030226170
sqlind-maybe-end-statement             2239226     37.331857515  1.667...e-05
sqlind-forward-syntactic-ws            1165612     22.049938802  1.891...e-05
sqlind-match-string                    2560110     16.610525812  6.488...e-06
sqlind-maybe-case-statement            1571490     6.7130119810  4.271...e-06
sqlind-maybe-create-statement          725420      5.8270935719  8.032...e-06
sqlind-maybe-exception-statement       889305      5.6407245250  6.342...e-06
sqlind-maybe-declare-statement         725420      5.1700415990  7.126...e-06
sqlind-maybe-defun-statement           724278      5.1276522169  7.079...e-06
sqlind-maybe-else-statement            889230      4.9588118390  5.576...e-06
sqlind-calculate-indentation           18402       1.2544363110  6.816...e-05
sqlind-syntax-symbol                   38497       0.3728449329  9.685...e-06
sqlind-indent-comment-start            5076        0.1974244270  3.889...e-05
sqlind-report-sytax-error              368         0.0724379439  0.0001968422
sqlind-syntax-in-select                851         0.0398730140  4.685...e-05
sqlind-syntax                          18664       0.0326221960  1.747...e-06
sqlind-anchor-point                    19377       0.0323783480  1.670...e-06
sqlind-search-backward                 544         0.0220059349  4.045...e-05
sqlind-indent-comment-continuation     406         0.0110237840  2.715...e-05
sqlind-use-previous-line-indentation   548         0.0097063889  1.771...e-05
sqlind-syntax-in-update                233         0.0073990810  3.175...e-05
sqlind-lineup-to-clause-end            585         0.0072548269  1.240...e-05
sqlind-right-justify-clause            417         0.0060931900  1.461...e-05
sqlind-outer-context                   2886        0.0052335580  1.813...e-06
sqlind-right-justify-logical-operator  393         0.0049095580  1.249...e-05
sqlind-same-level-statement            754         0.0046353559  6.147...e-06
sqlind-syntax-in-insert                44          0.003976809   9.038...e-05
sqlind-indent-select-column            114         0.0022326980  1.958...e-05
sqlind-syntax-in-delete                34          0.0009703510  2.853...e-05
sqlind-indent-select-table             15          0.0002573569  1.715...e-05
sqlind-setup                           1           2.332e-05     2.332e-05

But as I got some errors, I'll need to write some other test cases.

Indenting region...24%
bad closing for begin block (151432 152717) [4 times]
Indenting region...25%
bad closing for begin block (151432 152717)
end statement closes nothing (154541 154541)
bad closing for begin block (151432 152717) [5 times]
end statement closes nothing (154683 154683)
bad closing for begin block (151432 152717) [4 times]
Indenting region...26%
bad closing for begin block (151432 152717) [6 times]
Indenting region...27%
bad closing for begin block (151432 152717) [6 times]
Indenting region...28%
bad closing for begin block (151432 152717) [4 times]
Indenting region...29%
bad closing for begin block (151432 152717) [8 times]
Indenting region...30%
bad closing for begin block (151432 152717) [4 times]
Indenting region...31%
bad closing for begin block (151432 152717) [2 times]
Indenting region...33%
bad closing for begin block (196256 203589) [6 times]
end statement closes nothing (203743 203743)
bad closing for begin block (196256 203589) [2 times]
Indenting region...34%
bad closing for begin block (196256 203589) [2 times]
Indenting region...35%
bad closing for begin block (196256 203589) [8 times]
Indenting region...36%
bad closing for begin block (196256 203589) [4 times]
Indenting region...37%
bad closing for begin block (196256 203589) [2 times]
Indenting region...38%
bad closing for begin block (230706 233735) [6 times]
end statement closes nothing (233887 233887)
bad closing for begin block (230706 233735) [4 times]
Indenting region...40%
bad closing for begin block (230706 233735) [2 times]
Indenting region...41%
bad closing for begin block (230706 233735) [4 times]
Indenting region...42%
bad closing for begin block (230706 233735) [7 times]
Indenting region...43%
bad closing for begin block (230706 233735) [7 times]
Indenting region...44%
bad closing for begin block (230706 233735) [12 times]
Indenting region...45%
bad closing for begin block (230706 233735) [10 times]
Indenting region...46%
bad closing for begin block (230706 233735) [2 times]
Indenting region...47%
bad closing for begin block (230706 233735) [5 times]
Indenting region...48%
bad closing for begin block (230706 233735) [10 times]
Indenting region...49%
bad closing for begin block (230706 233735) [5 times]
Indenting region...50%
bad closing for begin block (230706 233735) [4 times]
Indenting region...51%
bad closing for begin block (230706 233735) [6 times]
Indenting region...52%
bad closing for begin block (230706 233735) [11 times]
Indenting region...53%
bad closing for begin block (230706 233735) [13 times]
Indenting region...54%
bad closing for begin block (230706 233735) [12 times]
Indenting region...55%
bad closing for begin block (230706 233735) [12 times]
Indenting region...56%
bad closing for begin block (230706 233735) [6 times]
Indenting region...57%
bad closing for begin block (230706 233735) [5 times]
Indenting region...58%
bad closing for begin block (230706 233735) [5 times]
Indenting region...59%
bad closing for begin block (230706 233735) [2 times]
Indenting region...60%
bad closing for begin block (230706 233735) [4 times]
Indenting region...61%
bad closing for begin block (230706 233735) [6 times]
Indenting region...62%
bad closing for begin block (230706 233735) [8 times]
Indenting region...70%
bad closing for begin block (230706 233735) [2 times]
Indenting region...77%
bad closing for begin block (230706 233735) [4 times]
Indenting region...78%
bad closing for begin block (230706 233735) [8 times]
Indenting region...79%
bad closing for begin block (230706 233735) [10 times]
Indenting region...80%
bad closing for begin block (230706 233735) [8 times]
Indenting region...81%
bad closing for begin block (230706 233735) [5 times]
Indenting region...82%
bad closing for begin block (230706 233735) [9 times]
Indenting region...83%
bad closing for begin block (230706 233735) [13 times]
Indenting region...84%
bad closing for begin block (230706 233735) [9 times]
Indenting region...86%
bad closing for begin block (230706 233735) [4 times]
Indenting region...87%
bad closing for begin block (230706 233735) [5 times]
Indenting region...88%
bad closing for begin block (230706 233735) [3 times]
Indenting region...89%
bad closing for begin block (230706 233735) [4 times]
Indenting region...90%
bad closing for begin block (230706 233735) [2 times]
Indenting region...91%
bad closing for begin block (230706 233735) [6 times]
Indenting region...93%
bad closing for begin block (230706 233735) [2 times]
Indenting region...94%
bad closing for begin block (230706 233735) [2 times]
Indenting region...95%
bad closing for begin block (230706 233735) [6 times]
Indenting region...96%
bad closing for begin block (230706 233735) [5 times]
Indenting region...97%
bad closing for begin block (230706 233735) [4 times]
Indenting region...98%
bad closing for begin block (230706 233735) [6 times]
Indenting region...99%
bad closing for begin block (230706 233735) [2 times]
end statement closes nothing (612071 612071)
bad closing for begin block (230706 233735) [8 times]
Indenting region...done

Hi @PierreTechoueyres, if it helps, you can find where the error is signaled by moving to the beginning of the file and moving to position 230706 (anchor) or 233735 (line to be indented), like so:

M-<
C-u 230706 C-f
M-<
C-u 233735 C-f

I will have a look to see what I can do about the performance problems. And don't worry, I've also seen large source files and I know they exist and not much can be done about that :-)

I updated some routines to make them faster. Can you please try running the profiler again?

Here are the results:

sqlind-beginning-of-block              151768      14354.970475  0.0945849617
sqlind-syntax-of-line                  16093       13737.876009  0.8536553786
sqlind-indent-line                     15516       13691.141829  0.8823886200
sqlind-beginning-of-statement          859179      12807.994192  0.0149072477
sqlind-begining-of-directive           858534      12436.946651  0.0144862598
sqlind-maybe-then-statement            1571490     10376.211280  0.0066027854
sqlind-maybe-begin-statement           862900      2386.2679911  0.0027654050
sqlind-beginning-of-statement-1        858534      291.52188991  0.0003395577
sqlind-maybe-loop-statement            886321      268.70379428  0.0003031675
sqlind-maybe-if-statement              1571844     71.019375108  4.518...e-05
sqlind-backward-syntactic-ws           531748      57.325275695  0.0001078053
sqlind-refine-end-syntax               2201        54.138459377  0.0245972100
sqlind-refine-syntax                   16093       49.136461062  0.0030532816
sqlind-forward-syntactic-ws            1165612     22.197750862  1.904...e-05
sqlind-maybe-end-statement             2239226     18.114385701  8.089...e-06
sqlind-maybe-else-statement            889230      7.5874313420  8.532...e-06
sqlind-maybe-case-statement            1571490     5.5715959490  3.545...e-06
sqlind-maybe-exception-statement       889305      5.5571842700  6.248...e-06
sqlind-maybe-create-statement          725420      5.2875502819  7.288...e-06
sqlind-maybe-defun-statement           724278      3.9871496979  5.504...e-06
sqlind-maybe-declare-statement         725420      3.5308236910  4.867...e-06
sqlind-calculate-indentation           18402       0.9669871989  5.254...e-05
sqlind-indent-comment-start            5076        0.2018063350  3.975...e-05
sqlind-report-sytax-error              368         0.0792575660  0.0002153738
sqlind-syntax-symbol                   39508       0.0673861049  1.705...e-06
sqlind-syntax-in-select                851         0.038526693   4.527...e-05
sqlind-syntax                          18664       0.0311338210  1.668...e-06
sqlind-anchor-point                    18829       0.0309169450  1.641...e-06
sqlind-search-backward                 544         0.0221370409  4.069...e-05
sqlind-indent-comment-continuation     406         0.011289636   2.780...e-05
sqlind-left-justify-logical-operator   585         0.007753582   1.325...e-05
sqlind-adjust-operator                 585         0.0073651659  1.259...e-05
sqlind-syntax-in-update                233         0.0070187530  3.012...e-05
sqlind-outer-context                   3897        0.0069121570  1.773...e-06
sqlind-lineup-to-clause-end            585         0.0067629000  1.156...e-05
sqlind-same-level-statement            754         0.0045993979  6.099...e-06
sqlind-lone-semicolon                  585         0.0043748789  7.478...e-06
sqlind-syntax-in-insert                44          0.0038100469  8.659...e-05
sqlind-indent-select-column            114         0.0022773450  1.997...e-05
sqlind-syntax-in-delete                34          0.0009244660  2.719...e-05
sqlind-syntax-keyword                  297         0.000493714   1.662...e-06
sqlind-minor-mode                      5           0.0003156380  6.312...e-05
sqlind-indent-select-table             15          0.0002350000  1.566...e-05
sqlind-setup-style-left                5           2.096e-05     4.192e-06

Hi @PierreTechoueyres , I pushed some changes to the performance-improvements branch, can you please test them?

They change the position of the anchor point for toplevel statements, so I'm only going to merge the changes if they provide some useful performance improvements.

Here are the results:

sqlind-syntax-of-line                  16093       1417.7996444  0.0881003942
sqlind-indent-line                     15516       1408.4987746  0.0907771832
sqlind-beginning-of-block              151768      1343.1896060  0.0088502820
sqlind-beginning-of-statement          859179      342.16032733  0.0003982410
sqlind-maybe-then-statement            1571490     318.42384262  0.0002026254
sqlind-beginning-of-statement-1        858534      270.30601676  0.0003148460
sqlind-beginning-of-directive          16093       205.38743425  0.0127625324
sqlind-maybe-begin-statement           862900      155.50229463  0.0001802089
sqlind-maybe-if-statement              1571844     70.969488346  4.515...e-05
sqlind-backward-syntactic-ws           531748      55.793341464  0.0001049244
sqlind-forward-syntactic-ws            1165612     21.173820814  1.816...e-05
sqlind-maybe-end-statement             2239226     18.022753036  8.048...e-06
sqlind-maybe-loop-statement            886321      13.411095119  1.513...e-05
sqlind-refine-end-syntax               2201        12.685209162  0.0057633844
sqlind-refine-syntax                   16093       11.708394253  0.0007275457
sqlind-maybe-case-statement            1571490     6.4108904959  4.079...e-06
sqlind-maybe-create-statement          725420      4.9618393499  6.839...e-06
sqlind-maybe-else-statement            889230      4.7965882960  5.394...e-06
sqlind-maybe-declare-statement         725420      4.3493562700  5.995...e-06
sqlind-maybe-defun-statement           724278      3.9191758449  5.411...e-06
sqlind-maybe-exception-statement       889305      3.2719761880  3.679...e-06
sqlind-calculate-indentation           18402       1.3126966189  7.133...e-05
sqlind-indent-comment-start            5076        0.5280375140  0.0001040263
sqlind-syntax-symbol                   39508       0.3941195849  9.975...e-06
sqlind-report-sytax-error              368         0.0851467340  0.0002313769
sqlind-syntax-in-select                851         0.0385483460  4.529...e-05
sqlind-anchor-point                    18829       0.0326812159  1.735...e-06
sqlind-syntax                          18664       0.0320503640  1.717...e-06
sqlind-search-backward                 544         0.0223065529  4.100...e-05
sqlind-indent-comment-continuation     406         0.0115934999  2.855...e-05
sqlind-left-justify-logical-operator   585         0.0089255200  1.525...e-05
sqlind-adjust-operator                 585         0.0078593250  1.343...e-05
sqlind-lineup-to-clause-end            585         0.0078512399  1.342...e-05
sqlind-outer-context                   3897        0.0068905400  1.768...e-06
sqlind-syntax-in-update                233         0.0068243479  2.928...e-05
sqlind-lone-semicolon                  585         0.0048225199  8.243...e-06
sqlind-same-level-statement            754         0.0047767680  6.335...e-06
sqlind-syntax-in-insert                44          0.0037759280  8.581...e-05
sqlind-indent-select-column            114         0.002426371   2.128...e-05
sqlind-syntax-in-delete                34          0.000874955   2.573...e-05
sqlind-syntax-keyword                  297         0.0005014360  1.688...e-06
sqlind-indent-select-table             15          0.000268118   1.787...e-05
sqlind-minor-mode                      1           7.4159e-05    7.4159e-05
sqlind-setup-style-left                1           5.6e-06       5.6e-06

So it looks like sqlind-indent-line, which is the top level function invoked each time you hit TAB, went down from 0.87 seconds to 0.09 seconds. This should make it usable for interactive use. Does it feel faster now to edit this large package?

# First run (baseline)
sqlind-indent-line                     15516       13597.269087  0.8763385593

# Last run (pull request #31)
sqlind-indent-line                     15516       1408.4987746  0.0907771832

Yes this look good. I've tried it today and it's really more responsive. Not faster than light but far from unusable.
Now I'll have work to provide you uses cases of "bad closing blocks".
Many thanks.

It is good that it is more responsive. I will close this issue for now.

For reference, if future performance improvements need to be made, the focus
would be on reducing the number of calls: for each sqlind-indent-line there
are 10 sqlind-beginning-of-block calls and more than 100
sqlind-maybe-end-statement (and other sqlind-maybe-* calls). This could
be achieved by caching previous information as markers in the buffer, similar
to what syntax-ppss does to speed up parse-partial-sexp. This, however,
would be a significant body of work...