FFDA/SourCherry

Hierarchical Storage

metal450 opened this issue · 17 comments

Hi again -

CherryTree now supports a long-awaited storage mechanism of "multiple files in a hierarchical structure": https://www.giuspen.net/2023/06/cherrytree-0-99-56-issued

Would be great if SourCherry could support it as well - it's infinitely more efficient for syncing between devices, for very large databases (due to large attachments that no longer have to be embedded in the db), & versioned backups :)

FFDA commented

Hello,
I will look in to this, but can't make any promises. I need to check how that new database structure works. One thing for sure I will still have to import SQL file into app-specific storage to open it. I just don't know what I will have to do with the rest of the files.Hello,
I will look in to this, but can't make any promises. I need to check how that new database structure works. One thing for sure I will still have to import SQL file into app-specific storage to open it. I just don't know what I will have to do with the rest of the files.

I think it's relatively simple - it's basically just xml db split into multiple files, rather than as just one single file.

FFDA commented

It might be simple, but multiple files might make the app slower. And as always - the reason is SAF. Even in their own documentation Google writes that it is slow and should not be used to manage large amount of files.
As an example - there is an app "Rename & Organize". It's purpose to rename and move photos/videos. A month or so ago it started using old type of storage access, so now to check the filenames if they need renaming of less than 400 photos/videos takes literally a second. Before it used SAF and the same action took about a minute, maybe even more. SourCherry uses SAF to access files outside it's app-specific storage so the same limitations apply to it too.
As I said I will see when I actually look into the new database structure.

multiple files might make the app slower.

That'd probably only affect it during search tho, right? (assuming that search is currently done by just iterating through all of the nodes sequentially, vs using a prebuilt search index, which would solve the issue entirely). Because other than search, it seems like navigating to / opening just a single node is such a small operation there wouldn't be much of a difference either way. O(1) vs O(n).

And as always - the reason is SAF.

My impression was that xml could actually be read in-place, it was only sql databases that had to be copied to app-specific storage & read with SAF? Since this is essentially just a bunch of xml files, couldn't it be done like the "Rename & Organize" example, & not use SAF at all? :)

I would love to be able to use this too. It's the only reason I haven't been able to switch to the new and improved storage type in Cherry Tree itself 😃

FFDA commented

So, I worked on Multifile database support using SAF. Right now I did only the navigation part and it does not show any of the content, however, results aren't promising. You can test it by downloading and installing this apk.
All navigation is done on the main thread (as with any other DB type) so lag is quite obvious. Especially when opening Node Filter function or bookmarks (even if you have just a few of them). However, opening node with a bunch of subnodes is laggy too.
As I guessed in my previous post - it's most likely SAF problem. It's hard to tell without testing Multifile database with "legacy"(?) type of storage access. But even someone at google thought it's worth mentioning that SAF is slow when making documentation for it. And when you take into consideration that they not always mention what actually works or not in the documentation it has to be the culprit for the laggy behavior.

opening just a single node is such a small operation

To load DrawerMenu with multiple nodes app has to query SAF for all the files of the folder it want to open. Then it has to iterate over them and open each subfolder using SAF individually and create a "ScNode" object. To do so app has to iterate over all the files of that node (folder) until it finds "node.xml" file and open it (using SAF) to parse it to get needed data (Name, id, type, etc.). After all of that is done it still has to read "subnodes.lst" file (using SAF) and sort the nodes in the order of node ids are in that file. And only then app will start loading node content. That will be done in the background thread so it will not lock UI, however it will have to search for the node user wants to open using only by its unique ID (folder name) from the start. And if node has couple of images it will have to read those using SAF too.
Just few other examples:

  • in XML and SQL databases these is a way to get all the nodes of the document with one line/query without the need to iterate over anything;
  • Using XML all the nodes actually comes in the order that they have to be displayed without a need to sort them in any other way, while for SQL database it's as simple as joining another table and setting ordering in SQL query.

couldn't it be done like the "Rename & Organize" example, & not use SAF at all?

I think it could be done, however will it be allowed on PlayStore? You can read what type of apps are permitted to use that permission here.

Right now I don't even know if I should continue with SAF and try to do it using normal storage access system. Worst case scenario I will publish version with Multifile database support on GitHub and F-Droid.

All navigation is done on the main thread (as with any other DB type) so lag is quite obvious. Especially when opening Node Filter function or bookmarks (even if you have just a few of them). However, opening node with a bunch of subnodes is laggy too. As I guessed in my previous post - it's most likely SAF problem.

I see. Most likely the solution to use SAF would need some sort of caching then - i.e. rather than touch the physical files synchronously on each operation, pre-load just the tree data and/or node metadata into some other storage structure, and then only go out to files for the content once a node is accessed. The cached tree could be updated on-launch - similar to how now it checks for an updated "mirrored database" on-launch and copies it to internal storage. Seems like that would solve it?

Based on your details of what it's doing, it seems like there's lots of steps that could be optimized like this, i.e. pre-reading the metadata (name,id,type,etc), pre-building a map of the order of node ids, etc. Or perhaps even easier, could probably even just load the entire db to memory at launch, minus embedded files (since text is small, attachments are the vast majority of memory usage). Then it'd only need to go out to SAF when an attachment or image is needed, otherwise it's all from memory after the initial load.

Basically it would just iterate the tree & read the structure once on launch, then everything else could use standard sql in-memory - except for attachments?

FFDA commented

Apk with full support for Multifile database using SAF can be downloaded from here. It should have feature parity with any other type of database.

In my experience the performance is subpar and I will not release it on PlayStore in this state. Just as example my personal database has 437 nodes. It takes ~20 seconds to load all of them for the node filter function. It also, takes a lot of time to load the last node on startup. Android even suggest of killing the app because it takes so much time. And just simple navigation loading has noticeable. Oh, delay can be felt for loading of node content too, it just does not lock the UI.

Basically it would just iterate the tree & read the structure once on launch

I'll use my DB as example. That would be 20 sec of load time every time I open an app. While other types of DB opens more or less instantaneously. It could be solved by saving some file with node structure in let's say XML file (something like normal XML database) and loading it from the storage at the startup the second time the app with Multifile database is loaded. However, it would not show any changes that where made after initial start up. Updating it would take the same amount of time as first start up and most importantly it would put me on the path recreating the CherryTree's XML database.

I have couple of ideas that could make some difference, that I have to test out. But I'm not optimistic about Multifile database with SAF.

Apk with full support for Multifile database using SAF can be downloaded from here

Amazing, thanks!!! Even if it's slow initially, being able to use this type of db in some form is at least a great step :)

Hmm, I can't seem to get it working on my end though - here's what I see:

  • On the main screen, I tap multi-file, then select the folder
  • I tap Open
  • It only takes about 5sec for the interface to change & the tree successfully shows up. For maybe a second or two, it seems like I can actually access and navigate the tree. However, a moment later I get the "SourCherry isn't responding - Close app / Wait" dialog. I waited >7min, it never went away.

Android even suggest of killing the app because it takes so much time.

Oh, well that should definitely be possible to overcome - it sounds like maybe the long processes are being done on the main/UI thread? As long as loading is done on a background thread (so no slow processes block the UI thread), it shouldn't detect it as a lockup & thus should be able to load "infinitely long" without the OS thinking it crashed. And the UI could show an indicator to assure the user that progress is being made (i.e. "Reading multi-file db, 25 nodes processed..."). But Android should only detect it as not responding if it blocks the main thread that's used for the message dispatcher.

And just simple navigation loading has noticeable. Oh, delay can be felt for loading of node content too, it just does not lock the UI.

Yeah - like in my previous comment, I think it would need some type of caching. My idea was pretty similar to your suggestion here:

It could be solved by saving some file with node structure in let's say XML file (something like normal XML database) and loading it from the storage at the startup the second time the app with Multifile database is loaded.

To go into a bit more detail, I think this should resolve every issue:

  • The first time multi-file is opened, the whole thing obviously has to be read. As long as it's done on a background thread w/ a progress indicator, even if it takes "minutes", the app won't appear to have frozen.
  • Once loaded, it can re-save a standard single-file db, which the app can now use like a cache. All app functionality now uses this db, so it should be exactly as quick & responsive as with any other single-file db.
  • On subsequent launch, the app first just loads this single-file "cache" db. But in the background, it will begin to scan the multi-file for changes (with i.e. a little spinner indicator to show that it's working). Since we actually know the expected total number of nodes, it could even show percentage estimate (i.e. the single-file cache has 100 nodes, as we scan in the background, we can say "Checking for updates - 56% done..."). This rescan would be much faster than than the initial load as it doesn't really need to open every file - just check each file's modification date, to see if it's later than the last time it was loaded (so it might require one additional datastructure, to store a map of file->expected modification date).
  • If it detects nothing changed, we're good to go.
  • If it detects something changed, the "easy" approach is to just show a prompt asking: "Multi-file data has changed, begin reload?" And it does the same entire initial load, from scratch. The "elegant" approach is to read in only the changes, and merge that into the single-file db that's being used as a cache.

Of course, the above only deals with reading, not writing, but I think for v1 it's fine to just say "multi-file dbs will be read-only." If/when they perform well, then perhaps you could consider read/write, but just as the first version of SourCherry was also read-only, I think just having access to multifile data in a relatively performant way is a great first step :)

Does that make sense?

FFDA commented

I can't seem to get it working on my end though

I can only guess that you DB is a lot larger (I mean in node count) than mine and the phone takes a lot longer to process them. Maybe something else happens on OS level. Android is quite aggressive in killing the apps to save battery life. If you want to test it out I can suggest to download one of the test database, convert it to Multifile one and try it out on your phone. It works on my personal phone and virtual device I use for testing. If it fails too I can look in to that.

it sounds like maybe the long processes are being done on the main/UI thread

Only for Multifile database in my experience. Other databases have no issues with that.

Yeah - like in my previous comment, I think it would need some type of caching. My idea was pretty similar to your suggestion

I know, and I admit, I wasn't clear enough, but I don't think it's a good idea. That's what I meant by put me on the path recreating the CherryTree's XML database. There is an XML database for that. All the updating in the background would add complexity to the code and degrade user experience. Depending on how often and how much DB changes users might see a lot of "phantom" nodes and would still have to wait quite a bit of time to see new nodes. Even if the data actually there and the only bottle neck is SAF.

If you want to test it out I can suggest to download one of the test database, convert it to Multifile one and try it out on your phone.

That sample db actually works perfectly for me (in multi-file)! No perceptible delay either in opening or navigating, it's pretty much just as snappy as the single-file ctb. My phone is a relatively new Zenfone 9 w/ 8gb ram, for what it's worth. Android 13.

I can only guess that you DB is a lot larger (I mean in node count)

My "Tree Info" stats (viewable from CherryTree on PC) are: Rich Text Nodes = 1531, Images = 74, Embedded Files = 495, Tables = 15+0. All other fields on the Tree Info dialog are 0. If I use Windows to get file properties of the multi-file folder, it contains 2,071 files & 1,531 subfolders.

Considering that the sample db works nearly perfectly, but my db freezes it "indefinitely," seems like something else strange might be going on here...?

So I tried my own ctb again, here's a more accurate description of how it seems to behave:

  • After I select the folder & tap "open", it opens instantly
  • If I use the hamburger menu at the top left to pop-out my list of nodes, it appears there, & I can scroll up & down & see them all. 100% snappy, good performance.
  • If I tap on a "content-only" node (aka contains no other nodes), it takes 15 seconds for the content of that single node to appear. However, the app doesn't appear frozen and doesn't show any system error popup.
  • What actually causes it to lockup (seemingly indefinitely) is when I tap any node that contains subnodes. Then I get the "SourCherry isn't responding." And actually, I realized that if I just tap "wait", the app is actually still working.

So perhaps it actually is working as you describe - I just have to keep repeatedly hitting "wait" to dismiss that dialog, and waiting 15+ seconds to load the content of a single node.

There must be a sensible way for this to work tho...

FFDA commented

I played around with using only SAF a bit more. However, I couldn't find a better/faster way. In the end I had to create an extra XML file with all the data needed for navigation in app-specific storage. App does it at the start of the first launch of Multifile database and it does take some time, but it just can't be helped. On sequential launches that file will be ready so it will be faster.
Right now app does not update the cached file at any point. So any new node added (or deleted) not from the app itself won't be seen in the app. I'm thinking about doing some kind of background sync later on, but I haven't decided yet.
apk for testing.

Excellent!! :) Just 2 comments:

  1. About updating the cache on external changes, I think just a "Scan For External Changes" item in the menu would be fine - basically a way to manually trigger it to regenerate the cache if we need to pull external changes. Seems like a fine quick & easy solution for now.

  2. Rather than caching navigation only, I think it would be better for the xml to contain the full node text (minus attachments), for the purpose of search. Currently navigating around is flawless, but search takes many minutes, probably because that's still going out to SAF & iterating each node.

Thanks again, this is great!

FFDA commented

I think just a "Scan For External Changes" item in the menu would be fine

I have the same idea. Maybe with setting to turn on the automatic update in the backup on the start in the preferences. Only problem I can think of with the automatic update is that if user creates a new node during the scan and it's somewhere where app already indexed the nodes it will not show up in the updated tree and user won't be able to access it until another update.

but search takes many minutes

But it's what user can expect from search. It simply can take long and user can cancel it by leaving the activity. The navigation part that's should be snappy as possible. Furthermore, adding node content to cached nodes would completely mirror XML type database with couple differences and it would require another refactoring for MultiReader again.

Right now I want to make a working version of app with Multifile database support, fix a couple of bugs I introduced in v1.4.0 and release a new version. After that the app might go into "maintenance" mode, where only bugs, depreciated methods get fixed and TargetSdk get updated that app would not get removed from the store.

Only problem I can think of with the automatic update is that if user creates a new node

That's easy - if that's a concern, just prevent new node creation during a scan. Can popup a message: "New nodes cannot be created while a scan is in progress," with options like: (Cancel scan & create node) (Continue waiting).

adding node content to cached nodes would completely mirror XML type database with couple differences

It would completely mirror it, minus attachments, yeah - I don't see any issue with that tho? If the user has a db without any attachments, it would be small; and their db is large, then 95% of the space is usually attachments, so again mirroring it doesn't really have much storage cost. So no real downside to just having the "cache" basically be "xml database without attachments," right?

But it's what user can expect from search. It simply can take long and user can cancel it by leaving the activity.

To clarify the difference, searching for a given term in a single-file db takes a second or 2; searching for the same term in the same db as multi-file takes >5min - pretty much rendering search not usable. I'd say this isn't critical / a blocker, as now at least multi-file as usable in some form :) However, without really having the ability to globally in search node content.

FFDA commented

v1.5.0 supports Multifile databases. If there are any bugs create a new issue.