Optimize episode loading

Question

Optimize episode loading

Closed this issue 3 months ago · 4 comments

Currently, podbit stores every loaded episode in a flat array. This means that we have to do a search through of every single episode in the cache to find those for a certain podcast. This is pretty slow, especially when you have a lot of episodes for one podcast and a few of another.

To make matters worse, to determine of an episode belongs to a podcast, we have to run a regex match on the URL every time. Each time we redraw the library, we may have to perform 100+ regex matches just to obtain information that we already had. Regex matches are pretty fast, but not free.

I want to perform similar improvements to the queue system, mapping episodes to their associated podcast. If possible, I would also like to cross-reference between the queue and cache by adding a pointer to the respective entry between them from one another.

Improvement plan

Replace flat array with a hashtable which maps a podcast name (string) to an array of episodes ([]data.Episode)
Change episode loading logic to perform matching once only for an episode and store in the hashtable
Rewrite podcast retrieval logic to simply search the hashtable
Clean up episode finding logic

Answer 1 · 2024-08-08T20:13:03.000Z

I've done some more work on this and might ship it with v4.0

Answer 2 · 2024-08-09T08:15:16.000Z

New plan:

Podbit will perform podcast matching once at startup and will not use the regex matching again except for newly added episodes (i.e stuff queued after startup)
These will be stored in a map alongside a flat array, which allows us to still do some nice searching for episodes based on title etc.
Perhaps we can also add a map of URL to episode etc. so that lookups in the library no longer take linear time

Basically, add some caches along side the full episodes array.

Answer 3 · 2024-08-09T22:24:17.000Z

Implemented just the first bullet point and the library is already visually more responsive with large queue files.

Answer 4 · 2024-08-10T11:18:52.000Z

Merged but some basic clean up is still to be done. Turns out, large gains in performance were obtained by simply pre-compiling the regex at startup!