sentriz/gonic

query optimisations for massive libraries

Opened this issue · 28 comments

gonic version: v0.16.2
from source: 80a9aeb..88e58c0 master-> origin/master

running behind ngnix proxy

When browsing artists/album or when starting a new stream the response timeouts and have to click again, this second time the server answers instantly. There are no issues with the streams that are already playing.

This issue looks similar to #411 but I'm not using docker/jukebox.

The only thing I see in the logs are:

Mar 04 12:18:08 server gonic[3420077]: 2024/03/04 12:18:08 error writing subsonic response: write tcp 10.11.12.13:4747->10.11.12.13:35128: i/o timeout
Mar 04 12:18:08 server gonic[3420077]: 2024/03/04 12:18:08 response 200 GET /getMusicDirectory?c=Jamstash&callback=angular.callbacks._1c&f=jsonp&id=al-99067&p=enc:123456789&u=user&v=1.6.0

My library is somewhat extreme: 3M tracks, 300K folders.

Thanks for your help and for this great project.

wow 3M tracks! can you try run with bba1a80 and see if you have any slow db queries? if a query takes longer than 5 seconds the http handler with timeout

Thanks for your quick response!

It looks I'm hitting that 5s timeout:

Mar 05 19:10:09 server gonic[1522298]: 2024/03/05 19:10:09 response  0  OPTIONS /getMusicDirectory.view?c=Jamstash&callback=angular.callbacks._3z&f=jsonp&id=al-98576&p=enc:12345678901234567&u=user&v=1.6.0
Mar 05 19:10:09 server gonic[1522298]: gorm sql/gonic/db/db.go:1162.088643msSELECT * FROM "users"  WHERE (name=?) ORDER BY "users"."id" ASC LIMIT 1[user] 1
Mar 05 19:10:09 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:761.232106msSELECT * FROM "albums"  WHERE ("albums"."id" = 98576) ORDER BY "albums"."id" ASC LIMIT 1[] 1
Mar 05 19:10:09 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:76789.401µsSELECT * FROM "album_stars"  WHERE ("album_id" IN (?)) AND (user_id=?) ORDER BY "album_stars"."user_id" ASC[98576 2] 0
Mar 05 19:10:09 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:76925.156µsSELECT * FROM "album_ratings"  WHERE ("album_id" IN (?)) AND (user_id=?) ORDER BY "album_ratings"."user_id" ASC[98576 2] 0
Mar 05 19:10:14 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:844.659379491sSELECT * FROM "albums"  WHERE (parent_id=?) ORDER BY albums.right_path COLLATE NOCASE[98576] 6229
Mar 05 19:10:14 server gonic[1522298]: gorm sql/gonic/podcast/podcast.go:4493.387870662sSELECT * FROM "podcast_episodes"  WHERE (status=?) ORDER BY updated_at DESC[downloading] 0
Mar 05 19:10:14 server gonic[1522298]: gorm sql/gonic/infocache/artistinfocache/artistinfocache.go:1092.421367091sSELECT "artists".* FROM "artists" LEFT JOIN artist_infos ON artist_infos.id=artists.id WHERE (artist_infos.id IS NULL OR artist_infos.updated_at<?)[2024-02-04 19:10:11.632248246 +0100 CET m=-2591903.976856681] 0
Mar 05 19:10:14 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:8436.93328msSELECT * FROM "album_stars"  WHERE ("album_id" IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,..<snip>... AND (user_id=?)[102807 103933 105783 106052 106544 98999...
Mar 05 19:10:14 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:846.882454msSELECT * FROM "album_ratings"  WHERE ("album_id" IN (?,?,?,?,?,?,?,?,?,?,?,?,?..<snip>... AND (user_id=?)[100243 100270 101787 102513 100559 102037 103125 256878 102864 103074 103654 98968 99165 100271 102334 104915 104969 105638 105840 98921 100269 103921...
Mar 05 19:10:14 server gonic[1522298]:  106279 100658 100931 101633 1
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:986.643003428sSELECT * FROM "tracks"  WHERE (album_id=?) ORDER BY "filename"[98576] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/podcast/podcast.go:4495.090854123sSELECT * FROM "podcast_episodes"  WHERE (status=?) ORDER BY updated_at DESC[downloading] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/infocache/artistinfocache/artistinfocache.go:1091.152361334sSELECT "artists".* FROM "artists" LEFT JOIN artist_infos ON artist_infos.id=artists.id WHERE (artist_infos.id IS NULL OR artist_infos.updated_at<?)[2024-02-04 19:10:19.63362109 +0100 CET m=-2591895.975483814] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:9863.126197msSELECT * FROM "artists" INNER JOIN "album_artists" ON "album_artists"."artist_id" = "artists"."id" WHERE (1 <> 1)[] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/podcast/podcast.go:44962.965328msSELECT * FROM "podcast_episodes"  WHERE (status=?) ORDER BY updated_at DESC[downloading] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_by_folder.go:98178.219µsSELECT * FROM "artists" INNER JOIN "track_artists" ON "track_artists"."artist_id" = "artists"."id" WHERE (1 <> 1)[] 0
Mar 05 19:10:20 server gonic[1522298]: gorm sql/gonic/server/ctrlsubsonic/handlers_raw.go:261520.187µsSELECT * FROM "transcode_preferences"  WHERE (user_id=?) AND (client COLLATE NOCASE IN (?,?)) ORDER BY client DESC LIMIT 1[2 * Jamstash] 1
Mar 05 19:10:20 server gonic[1522298]: 2024/03/05 19:10:20 error writing subsonic response: write tcp 10.11.12.13:4747->10.11.12.13:45596: i/o timeout
Mar 05 19:10:20 server gonic[1522298]: 2024/03/05 19:10:20 response  200  GET /getMusicDirectory?c=Jamstash&callback=angular.callbacks._3z&f=jsonp&id=al-98576&p=enc:12345678901234567&u=user&v=1.6.0
Mar 05 19:10:25 server gonic[1522298]: gorm sql/gonic/podcast/podcast.go:449476.721µsSELECT * FROM "podcast_episodes"  WHERE (status=?) ORDER BY updated_at DESC[downloading] 0
Mar 05 19:10:27 server gonic[1522298]: gorm sql/gonic/infocache/artistinfocache/artistinfocache.go:10935.083355msSELECT "artists".* FROM "artists" LEFT JOIN artist_infos ON artist_infos.id=artists.id WHERE (artist_infos.id IS NULL OR artist_infos.updated_at<?)[2024-02-04 19:10:27.631942681 +0100 CET m=-2591887.977162288] 0

I did a quick test increasing timeout vars in cmd/gonic/gonic.go:

        var (
                readTimeout  = 10 * time.Second
                writeTimeout = 10 * time.Second
                idleTimeout  = 10 * time.Second
        )

.. and so far looks good. Not sure if it's the correct way to fix it.

ah looks like the artist info query is taking 10.94 seconds

maybe i can optimise that query. would you mind emailing me a copy of your database? you can remove personal info from it like

$ cp gonic.db backup.db
$ sqlite3 backup.db "delete from users"

then email me the backup

Could it be a cache miss and then a delay from last.fm? I guess probably not if it happens to manifest for the guy with 3M tracks but I thought I'd throw it out there.

@brian-doherty it seems to be the query used to find the next artist info that needs to be refreshed

Mar 05 19:10:27 server gonic[1522298]: gorm sql/gonic/infocache/artistinfocache/artistinfocache.go:10935.083355msSELECT "artists".* FROM "artists" LEFT JOIN artist_infos ON artist_infos.id=artists.id WHERE (artist_infos.id IS NULL OR artist_infos.updated_at<?)[2024-02-04 19:10:27.631942681 +0100 CET m=-2591887.977162288] 0

you can see the query is taking 10sec (10935.083355ms), in that time the database is locked so other requests will timeout waiting for the db to be unlocked

also seems like we're doing a scan
image

does anyone know how we could remove the scan on artists?
https://www.db-fiddle.com/f/miGsKWi3oZBBdjkejeAdgM/1

the query runs every N seconds to find artists with no or stale info so needs to be pretty snappy

Scan or not, when you run this query you're potentially asking for a very large number of rows very frequently, even though you're keeping the info cached for 30 days. Would it work better if you asked it to only return the n oldest, and handled those, and left the rest for the next refresh cycle?

that query should just return one row (there a LIMIT 1 on the end) and in the application it's a First()

My bad. Maybe you need an index on updated_at? Then it can go straight to the record with the lowest value instead of searching for one that's below.

that's what i was thinking too, there is an index there on line 7, still no luck

i've also asked on the ##sqlite IRC channel on libera, they're usually quite helpful in there

What if you order by updated_at, instead of id? Then it doesn't have to search at all?

ooh it seems to be different. at least the scan is a bit lower. dunno if that's better https://www.db-fiddle.com/f/miGsKWi3oZBBdjkejeAdgM/3

i can test it though if @cascooscuro sends the DB

DB file shared. Thanks for your help!

haha i just tested it, it's actually quite quick

Mar 05 19:10:27 server gonic[1522298]: gorm sql/gonic/infocache/artistinfocache/artistinfocache.go:10935.083355msSELECT "artists".* FROM "artists" LEFT JOIN artist_infos ON artist_infos.id=artists.id WHERE (artist_infos.id IS NULL OR artist_infos.updated_at<?)[2024-02-04 19:10:27.631942681 +0100 CET m=-2591887.977162288] 0

the log output is a bit silly, this is actually artistinfocache.go:10935.083355ms is actually 35ms on line 109 😁

identified two more slow queries on @cascooscuro's database that are near the 5s timeout

4.12s   SELECT albums.*, count(tracks.id) child_count, sum(tracks.length) duration FROM "albums" LEFT JOIN tracks ON tracks.album_id=albums.id JOIN album_artists ON album_artists.album_id=albums.id  GROUP BY albums.id ORDER BY created_at DESC LIMIT 50 OFFSET 0
4.13s   SELECT artists.*, count(albums.id) album_count FROM "artists" JOIN artist_stars ON artist_stars.artist_id=artists.id JOIN artist_appearances ON artist_appearances.artist_id=artists.id JOIN albums ON albums.id=artist_appearances.album_id WHERE (artist_stars.user_id=?) GROUP BY artists.id ORDER BY artist_stars.star_date DESC

Now those are queries I'd expect to be slow. :D Start caching the sums and counts in the DB?

could help but seems like the order by is the biggest offender, by far

image

this is after adding an index on created_at

https://www.db-fiddle.com/f/vQKEw9wCovnzqCoZTfnjod/1

From my limited experience with Sqlite, it can be quite dumb when doing joins. With indices the query should be fast, I suspect it's not using them - it should help to rewrite it using subqueries (i.e. first select the album ids explicitly, then do the joins).

@lomereiter investigations and/or pull requests very welcome! if you like I can provide a large database for testing

I did a test on my minuscule DB, turns out that both indexes (on albums.created_at and tracks.album_id) can help if a subquery is used to help the query planner.

I wonder what the difference is like for the large database, not sure if it's worth sharing the file, you could just run this one query locally and see if it helps.

$ time sqlite3 gonic.db 'SELECT albums.*, count(tracks.id) child_count, sum(tracks.length) duration 
        FROM albums 
        LEFT JOIN tracks ON tracks.album_id=albums.id 
        JOIN album_artists ON album_artists.album_id=albums.id 
        GROUP BY albums.id 
        ORDER BY created_at DESC 
        LIMIT 50 OFFSET 0;' >before.txt

real    0m0.022s
user    0m0.022s
sys     0m0.000s
$ time sqlite3 gonic.db 'SELECT falbums.*, count(tracks.id) child_count, sum(tracks.length) duration 
        FROM (SELECT albums.* 
                FROM albums 
                JOIN album_artists ON album_artists.album_id=albums.id 
                ORDER BY created_at DESC 
                LIMIT 50 OFFSET 0) falbums 
        LEFT JOIN tracks ON tracks.album_id=falbums.id 
        GROUP BY falbums.id 
        ORDER BY created_at DESC;' >after.txt

real    0m0.004s
user    0m0.000s
sys     0m0.004s
$ diff after.txt before.txt

Compare the query plans:

QUERY PLAN
|--SCAN albums
|--SEARCH tracks USING INDEX idx_test (album_id=?) LEFT-JOIN
|--SEARCH album_artists USING COVERING INDEX idx_album_id_artist_id (album_id=?)
`--USE TEMP B-TREE FOR ORDER BY

vs

QUERY PLAN
|--CO-ROUTINE falbums
|  |--SCAN albums USING INDEX idx_test2
|  `--SEARCH album_artists USING COVERING INDEX idx_album_id_artist_id (album_id=?)
|--SCAN falbums
|--SEARCH tracks USING INDEX idx_test (album_id=?) LEFT-JOIN
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

The first time I run it:

time sqlite3 gonic.db 'SELECT albums.*, count(tracks.id) child_count, sum(tracks.length) duration
        FROM albums
        LEFT JOIN tracks ON tracks.album_id=albums.id
        JOIN album_artists ON album_artists.album_id=albums.id
        GROUP BY albums.id
        ORDER BY created_at DESC
        LIMIT 50 OFFSET 0;' >before.txt

real    0m6.299s
user    0m2.552s
sys     0m0.960s

and then:

time sqlite3 gonic.db 'SELECT albums.*, count(tracks.id) child_count, sum(tracks.length) duration
        FROM albums
        LEFT JOIN tracks ON tracks.album_id=albums.id
        JOIN album_artists ON album_artists.album_id=albums.id
        GROUP BY albums.id
        ORDER BY created_at DESC
        LIMIT 50 OFFSET 0;' >before.txt

real    0m1.992s
user    0m1.769s
sys     0m0.198s

and the "after" query:

time sqlite3 gonic.db 'SELECT falbums.*, count(tracks.id) child_count, sum(tracks.length) duration
        FROM (SELECT albums.*
                FROM albums
                JOIN album_artists ON album_artists.album_id=albums.id
                ORDER BY created_at DESC
                LIMIT 50 OFFSET 0) falbums
        LEFT JOIN tracks ON tracks.album_id=falbums.id
        GROUP BY falbums.id
        ORDER BY created_at DESC;' >after.txt

real    0m1.824s
user    0m1.469s
sys     0m0.333s

The results are consistent in what I see with gonic. The first time I click it may timeout, but after that it "warms" and the response is much quicker.

@cascooscuro interesting, I suppose what you call "warm-up" is reading from disk. If you don't mind please share the DB (send an email to swift.hat0662@fastmail.com)

the query plans:

sqlite3 gonic.db 'EXPLAIN QUERY PLAN SELECT albums.*, count(tracks.id) child_count, sum(tracks.length) duration
        FROM albums
        LEFT JOIN tracks ON tracks.album_id=albums.id
        JOIN album_artists ON album_artists.album_id=albums.id
        GROUP BY albums.id
        ORDER BY created_at DESC
        LIMIT 50 OFFSET 0;'
QUERY PLAN
|--SCAN albums
|--SEARCH album_artists USING COVERING INDEX idx_album_id_artist_id (album_id=?)
|--BLOOM FILTER ON tracks (album_id=?)
|--SEARCH tracks USING AUTOMATIC COVERING INDEX (album_id=?) LEFT-JOIN
`--USE TEMP B-TREE FOR ORDER BY
 sqlite3 gonic.db 'EXPLAIN QUERY PLAN SELECT falbums.*, count(tracks.id) child_count, sum(tracks.length) duration
        FROM (SELECT albums.*
                FROM albums
                JOIN album_artists ON album_artists.album_id=albums.id
                ORDER BY created_at DESC
                LIMIT 50 OFFSET 0) falbums
        LEFT JOIN tracks ON tracks.album_id=falbums.id
        GROUP BY falbums.id
        ORDER BY created_at DESC;'
QUERY PLAN
|--CO-ROUTINE falbums
|  |--SCAN album_artists
|  |--SEARCH albums USING INTEGER PRIMARY KEY (rowid=?)
|  `--USE TEMP B-TREE FOR ORDER BY
|--SCAN falbums
|--BLOOM FILTER ON tracks (album_id=?)
|--SEARCH tracks USING AUTOMATIC COVERING INDEX (album_id=?) LEFT-JOIN
|--USE TEMP B-TREE FOR GROUP BY
`--USE TEMP B-TREE FOR ORDER BY

Ah, I see, you should have added the indices first, as in the fiddle:

CREATE INDEX idx_test on tracks (album_id);
CREATE INDEX idx_test_2 on albums (created_at);

Anyway, I think a single index on tracks is good enough, as albums can be sorted by many other fields - see the PR.