Optimize episode loading
Closed this issue · 4 comments
Currently, podbit stores every loaded episode in a flat array. This means that we have to do a search through of every single episode in the cache to find those for a certain podcast. This is pretty slow, especially when you have a lot of episodes for one podcast and a few of another.
To make matters worse, to determine of an episode belongs to a podcast, we have to run a regex match on the URL every time. Each time we redraw the library, we may have to perform 100+ regex matches just to obtain information that we already had. Regex matches are pretty fast, but not free.
I want to perform similar improvements to the queue system, mapping episodes to their associated podcast. If possible, I would also like to cross-reference between the queue and cache by adding a pointer to the respective entry between them from one another.
Improvement plan
- Replace flat array with a hashtable which maps a podcast name (string) to an array of episodes ([]data.Episode)
- Change episode loading logic to perform matching once only for an episode and store in the hashtable
- Rewrite podcast retrieval logic to simply search the hashtable
- Clean up episode finding logic
I've done some more work on this and might ship it with v4.0
New plan:
- Podbit will perform podcast matching once at startup and will not use the regex matching again except for newly added episodes (i.e stuff queued after startup)
- These will be stored in a map alongside a flat array, which allows us to still do some nice searching for episodes based on title etc.
- Perhaps we can also add a map of URL to episode etc. so that lookups in the library no longer take linear time
Basically, add some caches along side the full episodes array.
Implemented just the first bullet point and the library is already visually more responsive with large queue files.
Merged but some basic clean up is still to be done. Turns out, large gains in performance were obtained by simply pre-compiling the regex at startup!