ProjectOrganizer: possible parsing optimization
dolik-rce opened this issue ยท 17 comments
I suspect that project organizer plugin parses some files twice, when opening a project with "Index all project files" set o Yes or Auto.
The reason for this is that the plugin parses all files but then Geany itself parses all opened files (or maybe the other way around?). So if you have a project that has many of its files already opened, it might make the opening of project needlessly slower. I believe that the plugin might safely skip files that are opened in current session (assuming it can get this information)
@techee: If you have no time to look into this, I might try to submit a PR myself. I'd just like to confirm first that this is not intentional and that there is no obvious blocker that would make this impossible.
I don't think this happens (there may of course be some bug). When collecting the files to be parsed, the plugin checks whether the file is open here:
and only passes those to Geany for parsing later in the function by tm_workspace_add_source_files()
. I can only imagine this could happen if the "project-open" signal of Geany was fired before the files from the project were all open - I haven't checked that, you can try if it's the case.
The plugin then has to check whether Geany opens/closes some of the files that were previously added because it could lead to duplicate symbols in the tag manager for the single file - see the handlers of "document-open" and "document-close" signals.
Oh and by the way, I'm now playing with LSP support for Geany, see https://github.com/techee/geany-lsp (the API is not yet part of Geany itself so now it's an all-in-one Geany+plugin repo). Since you wanted an improved kotlin support in Geany, you might want to try https://github.com/fwcd/kotlin-language-server together together with the plugin. I haven't tried this particular server so if you run into issues, please tell me.
Well, I did a simple test:
- I have added
printf
call to a kotlin parser, that prints first few bytes of the file. - I have recompiled geany.
- Then I created a simple project containing single kotlin file.
- Run Geany and opened the simple project.
- The log shows that the parser really runs twice.
I'll try to figure out why it happens.
I've seen your PR about LSP just today and I have looked at the kotlin server you link. I'm just little worried that it is written by single guy who doesn't even use Kotlin, so he has no real reason to maintain it ๐ But I might try to look into it, just to test the LSP integration.
The log shows that the parser really runs twice.
Then I'd suggest having a look if files are already open when the "project-open" signal runs - if not, it would explain this problem (but I'm not sure what to do about it).
Yes, it is exactly as you guessed. The signal is emitted before the files are opened:
gboolean project_load_file_with_session(const gchar *locale_file_name)
{
if (project_load_file(locale_file_name)) // project-open signal is emitted in this function
{
configuration_open_files(app->project->priv->session_files); // files are opened here
app->project->priv->session_files = NULL;
document_new_file_if_non_open();
ui_focus_current_document();
return TRUE;
}
return FALSE;
}
I have found a (little ugly) way to workaround this, see #1290...
I haven't checked your patch and what exactly happens in Geany yet but wouldn't it be possible to just add g_idle_add()
(or better plugin_idle_add()
) to the "project-open" handler and do what is done now in the idle callback? Geany could then open the files before the idle callback is called.
That is an interesting idea. Especially if the work could be divided into small chunks, so it would not block the UI for too long. I'll try to have a look into that.
I did just a quick test and it works. It really prevents the double parsing.
The downside is that for a big project mounted on slow sshfs it blocks the UI for a long time, which is not really user friendly. Previously it blocked while still showing the open project dialog, so the user saw it's still loading. Now it opens the projects and then stalls, which would probably confuse people. It should be either split into smaller parts or some visual indication of progress should be given.
I've drafted the code for parsing each file separately in idle handler (#1291), to avoid blocking the UI for too long.
It would actually be great if something like this could be used in Geany itself as well - only the current document needs to be fully parsed when opened, other files might be parsed "in background" to improve start-up / project open times. Too bad that cooperative multitasking/threading is so complicated in C.
BTW, it is funny, that very similar thing (that asynchronous parsing would speed things up), was mentioned in the LSP pull request just today ๐
I've drafted the code for parsing each file separately in idle handler (#1291), to avoid blocking the UI for too long.
I've just tried that but it parses all files at once right? And adding all files to TM is actually a feature. In the past files could only be added one by one but there was a huge performance problems of that - after each added file the tag list was sorted and this sorting happened after adding each file on a growing list of tags. For big projects the time spent on sorting was much bigger than the parsing itself. Now, when you add all the files at once, Geany just performs one sort at the end and e.g. for all linux kernel sources it takes something like 3s out of total 30s spent on parsing.
As you mentioned, it looks quite weird now with the cursor and editor "ready" but impossible to type anything. I think I prefer the previous behavior.
It would actually be great if something like this could be used in Geany itself as well - only the current document needs to be fully parsed when opened, other files might be parsed "in background" to improve start-up / project open times. Too bad that cooperative multitasking/threading is so complicated in C.
I think ctags isn't re-entrant so it would be hard to do. The big advantage of the single-threaded approach is the simplicity of the implementation and simplicity to reproduce bugs.
BTW, it is funny, that very similar thing (that asynchronous parsing would speed things up), was geany/geany#3571 (comment) in the LSP pull request just today ๐
It's true parallel execution in a separate process though. I was actually thinking about one thing - it would be possible to create a ctags-backed LSP server living in a separate process doing something similar to what tag manager does in Geany. But I'm not sure if I'm sufficiently motivated to do something like that myself :-).
Is the original performance problem with double-parsing too big? For fast parsers like C or basically all the hand-written parsers unless you have some really huge file open, parsing say 100 files is done in a fraction of second so kind of irrelevant.
I think this problem is visible with the rather slow peg kotlin parser, right?
I've just tried that but it parses all files at once right?
I believe the behavior is still the same. I just moved the code around a bit, so only difference should be in timing. It parses each file separately, storing the result in data->source_files
and tm_workspace_add_source_files(data->source_files)
is called only once, after all files are processed.
I think ctags isn't re-entrant so it would be hard to do. The big advantage of the single-threaded approach is the simplicity of the implementation and simplicity to reproduce bugs.
It doesn't have to be reentrant, as long as you always parse whole file at once and do it in one thread only. Actually, it could even work in single-threaded architecture. The processing tasks could be queued and processed by in idle handler. Every programmer stops once in a while to think, so the parsing could be done easily in those pauses.
I was actually thinking about one thing - it would be possible to create a ctags-backed LSP server living in a separate process doing something similar to what tag manager does in Geany. But I'm not sure if I'm sufficiently motivated to do something like that myself :-).
Yes, this exact idea crossed my mind today too ๐ It might solve some of the issues of that PR, by removing the ctags code from geany into LSP and making (I assume) most of the TM unnecessary. So all that would remain in Geany would be LSP support.
Is the original performance problem with double-parsing too big?
I have actually optimized the parser a lot. Ctags binary can now parse all files in my biggest Kotlin project in under 2s, which would be quick enough for most common uses. The problem is that I'm used to work on remote sources over sshfs, which adds more overhead. So I was looking into some cheap ways to optimize it along the way and removing double parsing in ProjectOrganizer looked like low hanging fruits... I guess I was bit too optimistic this time.
Anyway, I'm quite okay if you say neither of those two PRs is worth the trouble. It was a nice exercise in C and I learned bit more about geany internals and plugins, so I've gained something if the code is not merged in the end.
Sorry, a little busy with the LSP plugin :-).
I believe the behavior is still the same. I just moved the code around a bit, so only difference should be in timing. It parses each file separately, storing the result in data->source_files and tm_workspace_add_source_files(data->source_files) is called only once, after all files are processed.
I think you misunderstand how it works - tm_workspace_add_source_files()
is the function that performs the parsing, tm_source_file_new()
just creates a kind of handle representing the file. So if you wanted to keep the UI kind of responsive while the project is parsed, you'd have to add the files one by one using tm_workspace_add_source_file()
but this would lead to the performance problem I mentioned.
And, in addition, each this addition would have to happen in a separate idle call (with priority LOW) so you'd give GTK chance to process other events if you wanted to have the UI partially responsive. Now my preference in this respect is to rather have the UI stuck for a while and fully working afterwards than semi-working for a longer time.
The problem is that I'm used to work on remote sources over sshfs, which adds more overhead.
Ah, OK, that's the problem then.
Anyway, I'm quite okay if you say neither of those two PRs is worth the trouble. It was a nice exercise in C and I learned bit more about geany internals and plugins, so I've gained something if the code is not merged in the end.
It's just a matter of the UI but to me the "stuck while nothing is loaded" looks better than "stuck while everything seems to be loaded and ready". I agree it's a bit stupid to parse the files twice but I'd prefer the UI to behave the way it works now.
Sorry, a little busy with the LSP plugin :-).
Yeah, "somebody" keeps raising issues on it (sorry) :-)
I like the suggestion to move the ctags into a separate process like LSP, but that won't happen "soon".
Sorry, a little busy with the LSP plugin :-).
Yeah, "somebody" keeps raising issues on it (sorry) :-)
That's totally great.
By "busy" I actually meant "doing this new exciting thing and neglecting the old boring tag manager" :-)
I like the suggestion to move the ctags into a separate process like LSP, but that won't happen "soon".
I like the suggestion too, I like less that it would probably be me who'd have to implement it ;-). But maybe I get bored one day with nothing better to do.
Sorry, a little busy with the LSP plugin :-).
No problem, that is a good cause ;-)
I think you misunderstand how it works - tm_workspace_add_source_files() is the function that performs the parsing, tm_source_file_new() just creates a kind of handle representing the file.
Oh, that explains why the speed-up was much less then expected. Anyway, I agree that the idle approach is not really viable, because it might confuse users.
What about the other approach then? Just reading the file list from the project #1290. It just skips the files that will be opened by Geany later. It will block, but hopefully for little bit shorter time than with the double-parsing.