General performance improvements

Question

General performance improvements

Closed this issue 3 years ago · 8 comments

I'm aware that it's an edge case, but I regularly work with files that are 100,000 + lines long with rager and minus can be fairly slow when working with them. This issue is simply requesting significant general performance improvements when paging large files.

I've ran rager through a profiler a few times when paging these massive files to determine where the bottlenecks are, and now I know where they are and have a few ideas for what can be done to eliminate them:

I have a fork with general performance improvements (it affects static paging with no line numbers most significantly, but I think it affects all paging at least somewhat), and I can file a PR with my changes if you would like. If you'd like me to file this PR, I'd still have more changes that I think should be made, since the changes in this PR are more like a band-aid than a fix.
Performance could be significantly improved by rewriting how the process of flattening + adding line numbers + search result highlighting works. Instead of doing this to all of the text every time you scroll at all (as happens in annotate_line_numbers), you could store, within the Pager struct, a vector of all the lines of text, already flattened with the lines numbers and search results highlighted. Then, whenever the terminal is resized, the search term changes, or line numbers are toggled, this vector could be regenerated. This would allow, then, for basically no lag when someone is only scrolling (as opposed to the major lag that can happen right now).

I'd be happy to put some work into helping to implement either one of these options, if you'd like. Before putting any more work into this or filing a PR, though, I'd appreciate feedback on your thoughts and what direction should be taken for this issue (if any).

Thank you so much!

Answer 1 · 2021-11-15T07:43:13.000Z

I am really interested in the changes. Thanks for filling the issue. Looking forward for the PR

Answer 2 · 2021-11-15T20:14:16.000Z

I realized that option 2 would not actually be very difficult to implement, so I did so and opened #54. Please let me know what you think :)

Answer 3 · 2021-11-16T12:13:56.000Z

I merged the #54. If you have any other ideas on improvements please file a PR. I would definitely be interested in seeing that.

Answer 4 · 2021-11-17T18:46:04.000Z

I was working on some more ways to improve performance, and I'm somewhat confused right now as to some things within the Pager::push_str function. It seems that it is designed to only show a line of text to the user once it has been flushed with a \n. This feels counterintuitive to me, though, as I would expect text to be shown as soon as I push it. Additionally, the documentation doesn't mention that the text must be flushed with a newline, and the example that it shows doesn't include a newline either (meaning that this example would actually result in no text being shown in the pager).

It there a reason for this behavior? I think it should be changed to not require a newline flush, and would be more than happy to push that in another PR soon if you would also like this change. I also don't think this would be considered a breaking change, since it would be changing behavior to match the documentation (more of a bug fix than a breaking change).

(also, if you would like me to open a separate issue to discuss this, I would be more than happy to do so :))

Answer 5 · 2021-11-18T01:25:29.000Z

EDIT: As for the reason, it is because it can help push chunks of text that are part of the same line consecutively. If I flush at every push, this functionality cannot be implemented. So we wait for a \n to come before flushing.

It is the best way I could find to give me this functionality. If you have any other way of getting the same functionality, please do file a PR for that.

Answer 6 · 2021-11-18T06:55:30.000Z

Couldn't you just leave this functionality up to whoever uses this library to implement? If they only want to show new text in specific chunks, they could only call push_str with the chunk that they then want to show. However, if you keep it as is, it prevents them from ever adding on to the most recent line in the pager and showing that immediately. I feel like keeping this as is only restricts functionality, and doesn't actually add any.

(If you would like to still only flush text at the newline, I feel like the best way would be to, when format_text is called, just drop the last item in the formatted_lines vector if lines does not end with a newline.)

Answer 7 · 2021-11-18T17:46:23.000Z

I will definitely try your idea

Answer 8 · 2021-11-26T21:47:55.000Z

#55 includes the last performance improvements I wanted to file, so I'll close this issue since my concerns have now been addressed.