savonet/liquidsoap

`on_track` increases memory usage over time if files have cover

vitoyucepi opened this issue ยท 12 comments

Describe the bug
For testing purposes, I'm running liquidsoap in memory-constrained containers.

It's not clear to me if memory is leaking, but eventually all containers were OOM-killed, even when I set a larger memory limit.
I tried calling the garbage collector manually, but it doesn't seem to make any difference.

Journal log
Nov 07 08:12:54 arch kernel: liquidsoap invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=200
Nov 07 08:12:54 arch kernel: CPU: 5 PID: 46290 Comm: liquidsoap Tainted: P           OE      6.5.9-arch2-1 #1 f3e31240753f1687edd57c7c0d0f930d7e8c4a55
Nov 07 08:12:54 arch kernel: Hardware name:
Nov 07 08:12:54 arch kernel: Call Trace:
Nov 07 08:12:54 arch kernel:  <TASK>
Nov 07 08:12:54 arch kernel:  dump_stack_lvl+0x47/0x60
Nov 07 08:12:54 arch kernel:  dump_header+0x4a/0x240
Nov 07 08:12:54 arch kernel:  oom_kill_process+0xf9/0x190
Nov 07 08:12:54 arch kernel:  out_of_memory+0x246/0x590
Nov 07 08:12:54 arch kernel:  mem_cgroup_out_of_memory+0x134/0x150
Nov 07 08:12:54 arch kernel:  try_charge_memcg+0x6ed/0x7d0
Nov 07 08:12:54 arch kernel:  charge_memcg+0x39/0xf0
Nov 07 08:12:54 arch kernel:  __mem_cgroup_charge+0x2c/0x80
Nov 07 08:12:54 arch kernel:  do_anonymous_page+0x89/0x3a0
Nov 07 08:12:54 arch kernel:  __handle_mm_fault+0xacc/0xbb0
Nov 07 08:12:54 arch kernel:  ? smp_call_function_single_async+0x22/0x50
Nov 07 08:12:54 arch kernel:  handle_mm_fault+0x155/0x350
Nov 07 08:12:54 arch kernel:  do_user_addr_fault+0x15f/0x640
Nov 07 08:12:54 arch kernel:  exc_page_fault+0x7f/0x180
Nov 07 08:12:54 arch kernel:  asm_exc_page_fault+0x26/0x30
Nov 07 08:12:54 arch kernel: RIP: 0033:0x7fcfd795d62c
Nov 07 08:12:54 arch kernel: Code: 7c 10 8a 0e 74 0a 0f b7 74 16 fe 66 89 74 17 fe 88 0f c3 8b 4c 16 fc 8b 36 89 4c 17 fc 89 37 c3 c5 fa 6f 06 c5 fa 6f 4c 16 f0 <c5> fa 7f 07 c5 fa 7f 4c 17 f0 c3 66 0f 1f 84 00>
Nov 07 08:12:54 arch kernel: RSP: 002b:00007fcfad1fb1a8 EFLAGS: 00010202
Nov 07 08:12:54 arch kernel: RAX: 00007fcfa0a9e000 RBX: 00007fcfafb9e018 RCX: 00007fcfa0a6a220
Nov 07 08:12:54 arch kernel: RDX: 0000000000000018 RSI: 00007fcfafb9e000 RDI: 00007fcfa0a9e000
Nov 07 08:12:54 arch kernel: RBP: 0000000000000018 R08: 0000000000000000 R09: 00007fcfbcf69000
Nov 07 08:12:54 arch kernel: R10: 00000000001c5be0 R11: 0000000000000001 R12: 00007fcfb0cb9000
Nov 07 08:12:54 arch kernel: R13: 00007fcfaf001000 R14: 0000000000000002 R15: 00007fcfbcbd9a18
Nov 07 08:12:54 arch kernel:  </TASK>
Nov 07 08:12:54 arch kernel: memory: usage 307200kB, limit 307200kB, failcnt 285
Nov 07 08:12:54 arch kernel: swap: usage 0kB, limit 307200kB, failcnt 0
Nov 07 08:12:54 arch kernel: Memory cgroup stats for /user.slice/user-1000.slice/user@1000.service/user.slice/docker-eaec52440d1b63e469f909bd15d7f7d2b489830e0ced84f2a8766b2a42122ab6.scope:
Nov 07 08:12:54 arch kernel: anon 309870592
Nov 07 08:12:54 arch kernel: file 1851392
Nov 07 08:12:54 arch kernel: kernel 2850816
Nov 07 08:12:54 arch kernel: kernel_stack 180224
Nov 07 08:12:54 arch kernel: pagetables 1331200
Nov 07 08:12:54 arch kernel: sec_pagetables 0
Nov 07 08:12:54 arch kernel: percpu 2912
Nov 07 08:12:54 arch kernel: sock 0
Nov 07 08:12:54 arch kernel: vmalloc 12288
Nov 07 08:12:54 arch kernel: shmem 1851392
Nov 07 08:12:54 arch kernel: zswap 0
Nov 07 08:12:54 arch kernel: zswapped 0
Nov 07 08:12:54 arch kernel: file_mapped 0
Nov 07 08:12:54 arch kernel: file_dirty 0
Nov 07 08:12:54 arch kernel: file_writeback 0
Nov 07 08:12:54 arch kernel: swapcached 0
Nov 07 08:12:54 arch kernel: anon_thp 266338304
Nov 07 08:12:54 arch kernel: file_thp 0
Nov 07 08:12:54 arch kernel: shmem_thp 0
Nov 07 08:12:54 arch kernel: inactive_anon 24793088
Nov 07 08:12:54 arch kernel: active_anon 286883840
Nov 07 08:12:54 arch kernel: inactive_file 0
Nov 07 08:12:54 arch kernel: active_file 0
Nov 07 08:12:54 arch kernel: unevictable 0
Nov 07 08:12:54 arch kernel: slab_reclaimable 434336
Nov 07 08:12:54 arch kernel: slab_unreclaimable 839144
Nov 07 08:12:54 arch kernel: slab 1273480
Nov 07 08:12:54 arch kernel: workingset_refault_anon 0
Nov 07 08:12:54 arch kernel: workingset_refault_file 0
Nov 07 08:12:54 arch kernel: workingset_activate_anon 0
Nov 07 08:12:54 arch kernel: workingset_activate_file 0
Nov 07 08:12:54 arch kernel: workingset_restore_anon 0
Nov 07 08:12:54 arch kernel: workingset_restore_file 0
Nov 07 08:12:54 arch kernel: workingset_nodereclaim 0
Nov 07 08:12:54 arch kernel: pgscan 0
Nov 07 08:12:54 arch kernel: pgsteal 0
Nov 07 08:12:54 arch kernel: pgscan_kswapd 0
Nov 07 08:12:54 arch kernel: pgscan_direct 0
Nov 07 08:12:54 arch kernel: pgscan_khugepaged 0
Nov 07 08:12:54 arch kernel: pgsteal_kswapd 0
Nov 07 08:12:54 arch kernel: pgsteal_direct 0
Nov 07 08:12:54 arch kernel: pgsteal_khugepaged 0
Nov 07 08:12:54 arch kernel: pgfault 282000785
Nov 07 08:12:54 arch kernel: pgmajfault 0
Nov 07 08:12:54 arch kernel: pgrefill 0
Nov 07 08:12:54 arch kernel: pgactivate 0
Nov 07 08:12:54 arch kernel: pgdeactivate 0
Nov 07 08:12:54 arch kernel: pglazyfree 0
Nov 07 08:12:54 arch kernel: pglazyfreed 0
Nov 07 08:12:54 arch kernel: zswpin 0
Nov 07 08:12:54 arch kernel: zswpout 0
Nov 07 08:12:54 arch kernel: thp_fault_alloc 19602
Nov 07 08:12:54 arch kernel: thp_collapse_alloc 33
Nov 07 08:12:54 arch kernel: Tasks state (memory values in pages):
Nov 07 08:12:54 arch kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 07 08:12:54 arch kernel: [  46251] 109999 46251      618      288    40960        0           200 tini
Nov 07 08:12:54 arch kernel: [  46282] 109999 46282   253883    96246  1306624        0           200 liquidsoap
Nov 07 08:12:54 arch kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-eaec52440d1b63e469f909bd15d7f7d2b489830e0ced84f2a8766b2a42122ab6.scope,mems_allowed=0,oom_memcg=/user.slice/user-1>
Nov 07 08:12:54 arch kernel: Memory cgroup out of memory: Killed process 46282 (liquidsoap) total-vm:1015532kB, anon-rss:302296kB, file-rss:82688kB, shmem-rss:0kB, UID:109999 pgtables:1276kB oom_score_adj:200
Nov 07 08:12:54 arch systemd[623]: docker-eaec52440d1b63e469f909bd15d7f7d2b489830e0ced84f2a8766b2a42122ab6.scope: A process of this unit has been killed by the OOM killer.
Nov 07 08:12:54 arch systemd[623]: docker-eaec52440d1b63e469f909bd15d7f7d2b489830e0ced84f2a8766b2a42122ab6.scope: Consumed 50min 31.232s CPU time.
Nov 07 08:12:54 arch systemd[623]: user.slice: A process of this unit has been killed by the OOM killer.
Nov 07 08:12:54 arch systemd[1]: user@1000.service: A process of this unit has been killed by the OOM killer.

To Reproduce

  1. Generate tracks with covers.
    mkdir music;
    for i in $(seq 1 9); do
       convert -size 600x600 plasma: cover.png;
       ffmpeg \
           -f lavfi -i "sine=frequency=${i}00:duration=1" \
           -i "cover.png" \
           -map 0:0 -map 1:0 \
           -c:v copy \
           -c:a libmp3lame -b:a 128k \
           -id3v2_version 4 \
           -metadata "title=$i" \
           -metadata:s:v title="Album cover" \
           -metadata:s:v comment="Cover (front)" \
           "music/$i.mp3";
       rm cover.png;
    done
  2. compose.yaml
    version: "3.8"
    services:
      liquidsoap:
        image: savonet/liquidsoap:v2.2.2
        command:
          - "/tmp/test/main.liq"
        deploy:
          resources:
            limits:
              memory: 300M
        logging:
          driver: "json-file"
          options:
            max-size: "1m"
            max-file: "3"
        volumes:
          - ./:/tmp/test
          - type: tmpfs
            target: /tmp
            tmpfs:
              size: 100M
  3. main.liq
    s = mksafe(playlist("/tmp/test/music", mode="normal"))
    s.on_track(ignore)
    s = clock(sync="none", s)
    output.dummy(s)
    
    thread.run(runtime.gc.compact, every=60., fast=false)
  4. docker compose up -d
  5. observe container stats using docker stats

Expected behavior
Process memory consumption stays at a certain level.

Version details

  • OS: Debian in docker container
  • Version: 2.2.2, 2.2.1

Install method
savonet/liquidsoap:v2.2.2

Common issues
#3502
#3402 (comment)

gAlleb commented

Hi @vitoyucepi ! Probably it's a dumb question, but what if we change on_track to on_metadata. Would it behave differently?

@gAlleb,
I haven't tried it.

# Generate a video source containing cover-art for current track of input audio
# source.
# @category Source / Video processing
# @param s Audio source whose metadata contain cover-art.
def video.cover(s) =
video = source.dynamic()
def read_cover(m) =
filename = m["filename"]
cover =
if file.exists(filename) then file.cover(filename) else "".{mime=""} end
if
null.defined(cover)
then
cover = null.get(cover)
ext = if cover.mime == "image/png" then ".png" else ".jpg" end
f = file.temp("cover", ext)
log.debug(
"Found cover for #{filename}."
)
file.write(data=cover, f)
video.set(request.once(request.create(temporary=true, f)))
else
log.debug(
"No cover for #{filename}."
)
video.set(source.fail())
end
end
s.on_track(read_cover)
(video : source(video=canvas))
end

toots commented

Thanks for reporting this one. I'll have a look asap.

gAlleb commented

I've run the cover manager extractor function (#3402 (reply in thread)) with s.on_metadata for a few days. Result is the same as with on_track.
Slowly liquidsoap process memory increases, I would say with each new track - memory increases by the size of the coverart (20-150 kb, after couple of days memory has increased by roughly 500Mb).

Hi @toots! Have you looked into this issue? Is there any chance in solving it?
I know this is not a priority but still. I'm not rushing you in any way, I'm just curious.

I'm able to reproduce. So far this looks like an issue with the GC. I generated a very large 6000x6000 pixel cover art and ran the script with agressive GC parameters and regular full_major.

I'm able to trace the issue down to the memory allocating the string for the cover (and also probably found the cause for #3502).

I'm also able to trace all the memory from the requests being collected.

However, the bad news is that memory does not seem to leak but, rather, to plateau at a very high value:

Screenshot 2023-12-12 at 1 09 05โ€ฏAM

For scale, the cover art is about 168Mo.

I'm gonna see what we can do to optimize the GC's behavior on this.

I see. Thanks for your work!

We've identified the underlying source as coming from the output list of the last 10 metadata, at least in my reproduction test. We're working on a short term and long term mitigation plan ๐Ÿ™‚

Thank you @toots! Switched to 2.2.3. Gonna test.

Hi @toots!

So I upgraded to 2.2.3 and have been running it more than 3 days with Vito's cover manager function switched back on:

def cover_files_manager(~id=null(), ~default_cover_file="", ~covers_keep=2)
    id = string.id.default(id, default="cover")
    # log = logger(id)
    mime_types = [
        ("image/gif", "gif"),
        ("image/jpeg", "jpeg"),
        ("image/jpg", "jpeg"),
        ("image/png", "png"),
        ("image/webp", "webp")
    ]

    cover_files = ref([])
    current_cover_file = ref(default_cover_file)

    def remove_file(filename)
        log.info("Remove #{string.quote(filename)}")
        file.remove(filename)
    end

    def rotate_cover_files(cover_file)
        current_cover_file := cover_file
        cover_files := cover_file::cover_files()
        if list.length(cover_files()) > covers_keep then
            remove_file(list.last(cover_files()))
            cover_files := list.prefix(covers_keep, cover_files())
        end
    end

    def save_cover_to_file(cover)
        mime = cover.mime
        extension = mime_types[mime]
        log.debug("mime=#{mime}, extension=#{extension}")
        cover_file =
            if extension != "" then
                cover_file = file.temp("#{id}_", ".#{extension}")
                file.write(cover_file, data=cover)
                cover_file
            else
                null()
            end
        (cover_file, mime)
    end

    def extract_cover_from_metadata(_metadata)
        filename = _metadata["filename"]
        log.info("Extracting cover from #{string.quote(filename)}.")
        cover = metadata.cover(_metadata)
        let (cover_file, mime) = null.case(cover, {(null(), "")}, save_cover_to_file)

        if cover == null() then
            log.important("File #{string.quote(filename)} has no cover.")
            current_cover_file := default_cover_file
        elsif cover_file == null() then
            log.important("File #{string.quote(filename)} has unknown mime type \
                           #{string.quote(mime)}.")
            current_cover_file := default_cover_file
        else
            cover_file = null.get(cover_file)
            log.important("Cover for #{string.quote(filename)} saved to \
                           #{string.quote(cover_file)}.")
            rotate_cover_files(cover_file)
        end
    end

    def get_cover()
        log.debug("current_cover_file=#{current_cover_file()}")
        current_cover_file()
    end

    get_cover.{set = extract_cover_from_metadata}
end

cover_files = cover_files_manager(default_cover_file=station_dir ^ "/img/cover_default.png")
radio.on_track(cover_files.set)

background = video.add_image(x=50, y=280, width=220, height=220, file=cover_files, background)

After 3 days of running extra 600Mb of RAM has been taken by liquidsoap process and continued increasing bit by bit so I switched off cover manager function.

So the problem persists.

Thanks for your patience and reporting @gAlleb. The full solution, which involves reading cover metadata as mmap'd bigstring to prevent them from impacting the runtime memory allocations is still being worked on. I'll report here!