out-of-cheese-error/gooseberry

Displaying annotations is too long

ngirard opened this issue · 13 comments

Using home-made binary from the trunk, on Ubuntu 20.04.
gooseberry search takes more that 7 seconds to display its output for 693 annotations. That's very long. Is there any reason why ?
I'm assuming no network operations are involved here, is that so ?

I'm assuming no network operations are involved here, is that so ?

I'm afraid gooseberry uses network operations for everything - annotations are always queried from hypothesis, the local database just stores their IDs and tags for filtering. I had a version in the past that also stored local copies and only changed the ones with an update date later than the last checked, but at that point the bincode serializer didn't support tagged enums meaning the only way to store annotations is as raw JSON which seemed like it would be pretty memory hungry. Maybe it's time to revisit this now, I've been using it on ~400 annotations so far but I've always used search with filters, so the delay was not so noticeable.

Edit: I assume you're building the binary with --release?

Another point here is that storing annotations locally needs the entire hypothesis filtering functionality to be re-implemented and kept up to date with whatever new filters are added to the official API.

I assume you're building the binary with --release?

No, I was lazy to do so, and will report back with the precompiled binary that is baking as I'm writing.
I guess since we're i/o-bound, there shouldn't be much difference, but hey.

Okay, let's see! The entire sync aspect right now is pretty much only there in case I decide to add back the local annotations thing, otherwise I can get rid of it.

Maybe it's time to revisit this now, I've been using it on ~400 annotations so far but I've always used search with filters, so the delay was not so noticeable.

Well.

On one hand, I reported my user experience from the expectations I had, back then.
I think that, if the docs explain that gooseberry uses network operations for everything, and that the preferred workflow is to limit the search with query operators such as --from, then my expectation (and I assume, others') would have been different, and I'd find a 2 second delay perfectly acceptable.

On the other hand, I'm convinced that storing the annotations locally is the way to go. Heck, maybe the reason Hypothes.is hasn't gotten much popular, is because both a freaking browser and a freaking web app are imposed intermediates between you and your data ?

Since you seem to be familiar with Obsidian, one of the biggest reasons it has gotten so popular is because it allows to manipulate your data locally, whereas the competitors (e.g. Roam) don't.

Also, perhaps you'll think I'm too opinionated, but I'm also convinced that the best way to store the annotations is a SQLite database. For many reasons I could elaborate ; but in a nutshell, it's a de facto standard ; it removes all barriers to connect your data to any other tool/need ; and data access could be made very fast to any other application using views and triggers.

What do you think ?

Reporting back: using the precompiled binary, gooseberry search takes 5 seconds now, as opposed to 7+ with the debug build.
That's not negligible !

Since you seem to be familiar with Obsidian, one of the biggest reasons it has gotten so popular is because it allows to manipulate your data locally, whereas the competitors (e.g. Roam) don't.

This is definitely a pretty strong point and one of the main reasons I wanted to do it that way in the first place, but then realized how much work it'd be to recreate the whole filtering system and decided to get a quick working tool first :p

I agree that 5 seconds is not negligible, and I guess it'll get worse as the number of annotations increases which is frustrating especially if you don't expect the old annotations to change much. I'll make a new issue for this and indeed look into SQLite - will take some refactoring, but this has to be done anyway since I'm planning to make an Obsidian plugin version that uses gooseberry as a library.

I'll make a new issue for this and indeed look into SQLite - will take some refactoring, but this has to be done anyway since I'm planning to make an Obsidian plugin version that uses gooseberry as a library.

That's so nice to hear !
I'm willing to help with the SQLite stuff, I'm just unlikely to be able to devote much time on this in the next 2 months. But in any case, let's keep in touch and don't hesitate to sollicit me for anything !

storing annotations locally needs the entire hypothesis filtering functionality to be re-implemented and kept up to date with whatever new filters are added to the official API.

I'm pretty sure it won't be a problem, given the slow pace the Hypothes.is project is advancing at...

I'm willing to help with the SQLite stuff, I'm just unlikely to be able to devote much time on this in the next 2 months. But in any case, let's keep in touch and don't hesitate to sollicit me for anything !

Great, will do, thanks!

@ngirard it's been a while but I have a PR (#99) with the local database functionality (using CBOR which actually needed pretty minimal refactoring of the codebase).

I'd like to test it out for a couple of common workflows before merging, to make sure everything stays in sync - would you be up for seeing how much difference it makes to your search times?

Hey @Ninjani, good to hear from you ! Congrats for your phd and your postdoc position ! Hope you're doing well at your new place.

I'm actually about to start a new assignment, so it's perfect time for me to rethink my habits & workflows. I have to admit that my old habits kicked in last year and I ended up putting Gooseberry on the back burner, especially since my colleagues didn't care to annotate their information sources — hence the lack of feedback from me. I apologize for that.

I wish to adopt Gooseberry... again, so I'll take a stab at this new PR today. May I ask why you chose CBOR serialization instead of SQLite? In any case, it's your project and I think you are sovereign in your choices.

Also, I'm afraid I'll be reporting a few issues over the weekend — nothing big, fortunately.

Cheers !

Thanks @ngirard !

I went with CBOR because it has serde support via ciborium into a binary format which I could then store directly in the existing sled database - so quite minimal refactoring of the code-base.

Good to hear that you'll test out the PR and I welcome the issues as well.

Cheers!