meeb/tubesync

Members Only Videos Continuously Failing

Closed this issue · 4 comments

Over the weekend one of the YouTube channels I follow with TubeSync added ~460 member only videos and it's brought my TubeSync installation to a stand-still.

Each of the videos is erroring in the sync.tasks.download_media_metadata task then rescheduling with a delay. Due to the number of videos this took hours to get through so that any of the other channels I follow had a chance to be processed. I was surprised to that the reschedule delay got reset the next day when the channel was re-processed, and I was back to everything being jammed up.

I've update the priority on the background_task records for these failing jobs so that the other channels can be processed so far. Short term what's the best way to stop these videos being continuously processed? Can a mark them as skipped in the sync_media table and delete the background_task records, or is there a better way?

Longer term there doesn't seem to be any indication in the YouTube api video data to show that these videos require membership, but the following error is being thrown

[tubesync/ERROR] ERROR: [youtube] t4Ukm9YDADQ: This video is available to this channel's members on level: LTT Member Plus (or any higher level). Join this channel to get access to members-only content and other exclusive perks.
Rescheduling Downloading metadata for "b83d27b2-b7f6-42ba-b615-5ca14ac21549"
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/background_task/tasks.py", line 43, in bg_runner
    func(*args, **kwargs)
  File "/app/sync/tasks.py", line 291, in download_media_metadata
    metadata = media.index_metadata()
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/sync/models.py", line 1494, in index_metadata
    return indexer(self.url)
           ^^^^^^^^^^^^^^^^^
  File "/app/sync/youtube.py", line 89, in get_media_info
    raise YouTubeError(f'Failed to extract_info for "{url}": No metadata was '
sync.youtube.YouTubeError: Failed to extract_info for "https://www.youtube.com/watch?v=t4Ukm9YDADQ": No metadata was returned by youtube-dl, check for error messages in the logs above. This task will be retried later with an exponential backoff.
Rescheduling task Downloading metadata for "b83d27b2-b7f6-42ba-b615-5ca14ac21549" for 0:04:21 later at 2024-12-15 19:29:19.699731+00:00

I don't know if there's enough information in the error raised to trap this reliably and skip continually retrying to fetch the metadata, bearing in mind someone who is a channel member and has setup their cookies correctly will want to be able to download the videos? I'm guessing they will just download without any error.

Here are some video id's for investigation/testing, let me know if you need any more information.

t4Ukm9YDADQ, T-2eGKOdYz4, suL9N97w_Dw, sQ1447P7G24, sQWuwWAyg08, SDP8ZwHo5-s

meeb commented

Thanks for the issue. I don't think these errors should block anything, they'll just be marked to retry over a day or so before permanently failing the downloads. They shouldn't block other media items from downloading. If you have hundreds of failures the retries could definitely delay some tasks though which is probably what you're experiencing. If you attempt to download hundreds of failing media items there's probably not a huge amount that could be done programmatically, however.

Yes, you can stop these items from being retried by marking them as skipped.

It's difficult to handle all download error eventualities which is why there's quite a lot of retrying in the first place (for example, a media item is listed in a playlist but as private or for members only, then made public later after a few hours). Hard-failing all downloads with this error might be quite frustrating for some people.

Generally the advised usage is just to ignore these errors until the retries fail naturally after a while, I appreciate this may be irritating if suddenly hundreds of media items start failing. Probably, really, the core of the issue is you're trying to download 460 inaccessible media items. Having said that, if you just let them fail or skip them, it should start working normally.

If you have a YouTube account which can access these videos and import your cookies they should download as normal.

Thanks, yeah I'm sure it's just the number of the video which is blocking the other channels being processed.

I'm new to Django and its background tasks so please correct me if any of this is wrong. The MAX_ATTEMPTS in TubeSync is set to 15, and based on the retry time algorithm from the documenation this would be ~14 hours before a task is marked as failed and stops retring. As a new download_media_metadata task is scheduled in media_post_save with remove_existing_tasks set to True, wouldn't that effectively reset the previous number of attempts to download the metadata? So if I'm checking a channel every 12 hours, the task will never reach the failed state before it get's deleted and a replacement task created, and the media will never be set to skipped? (I'm struggling to find where the sync_media record is set to skipped when the task fails in the code)

If a task is never going to complete, say because I'm not a channel member and the video is for members only, wouldn't it be better to mark the video as skipped straight away? Retrying is a fine strategy for errors which may sort themselves out, but if we know a task is never going to complete it's a waste of resources.

On a side issue, I'm trying to make container and it's failing with a sha256sum: 'standard input': no properly formatted checksum lines found. Have you come across this? Am I missing something in my build chain, or is there and updated Dockerfile available?

meeb commented

Oh, you might be correct there looking at it. Originally there was a generic skip flag which handled this correctly. There have been some PRs and other changes that rework how skips work (splitting it into skip and manual_skip, filtering added, etc.) and it looks like this may cause a loop as you describe now, thanks for the analysis I'll poke into it shortly.

As for the container build breaking, this is caused by the ffmpeg releases. Every time the yt-dlp team build a new ffmpeg release the old ones are removed (the download links 404) which breaks the container build. The "fix" is to update the ffmpeg versions and hashes at the top of the Dockerfile which I generally do whenever I notice it's not working any more.

After a little more reading through the code I found that when the download_media_metadata task is added in the media post_save function, it is added if the instance.metadata is empty but doesn't check the instance.skip flag. As the metadata is never downloaded for these member only videos, the download_media_metadata task will always be added, even if the previous job failed after 15 attempts and set the skip flag.

I'm testing a version with a not instance.skip check added at the moment and will let you know how it goes after it's been running awhile.

Updating the FFMPEG variables in the Dockerfile fixed my issue building the container image, thanks.