cglatot/pasta

Only apply subtitles on season if it matches "Everything"

phexgmbh opened this issue · 12 comments

As of right now, I'm trying to apply my forced subtitles on all of my shows. However, when choosing "English - forced" and there is no forced but full subtitle on an episode, it chooses the full subtitle. Which is not what im intending.

Would it be possible to add a button that only applies the subtitle on a season/show if it matches "Everything" instead of "Name"?

Cheers!

The way I have the track / subtitle matching set up is that it will look at the name, title, language, and language code.

It will try to make a perfect match if it can - so this should already be doing what you need, and if it can't it will go back and see which track matches the most criteria. Matching based on name alone is actually pretty low down in the priority.

I could add a button to have exact matching only, but if there isn't an exact match across all 4 criteria, then it will not change the tracks, and honestly I don't see the value in doing that.

@cglatot Thanks for your reply!

Lets take Game of Thrones for instance. Theres a lot of "alien language" where forced subtitles are required to be able to understand. However, like most people, I wouldn't want an english full subtitle on my movie, only an english forced subtitle.
So instead of only choosing forced subtitles on the episodes that contain alien language, it activates the full subtitle on those that dont have a forced subtitle, making me have to turn off the full subtitle manually, which technically is what I'm trying to avoid with your tool (turning off full subtitle manually <-> turning on forced subtitle manually).

I hope you understand what Im trying to say :c

Ahhhhhhhhhhhhhhhhhh I see what you mean now. That is actually a very valid point.

I will need to have some thought about how to best implement this as it is a bit of a niche case. It's also not just as straightforward as implementing only matching on "everything" as the "forced" keyword can exist in either the Name or the Title - and they might be named differently across episodes or seasons.

@cglatot gotcha! If its possible, that'd be great. If not, thats alright, too :) still a great tool.
Maybe a general option to just search for forced subtitles and activate them (rather than choose one option and trying to duplicate on other episodes/movies?)

This seems to be an issue with a lot of uploads, especially anime. MKV supports the "forced"-flag, but many uploaders don't set it correctly or not at all, so you end up with subtitle tracks that have the keyword "forced" in the name but lack the flag. I made a python script that updates mkv files with the correct flags which can be run like so:

It needs MKVToolNix installed which in turn has to be added to the PATH otherwise the script can't find "mkvpropedit.exe"
py script-name.py path/to/files/to/update

#!/usr/bin/env python3
import concurrent.futures as futures
import json
import os
import subprocess
import sys

languages = [
    ('ger', 'German', 'Deutsch'),
    ('eng', 'English', 'Englisch'),
    ('jpn', 'Japanese', 'Japanisch'),
    ('fre', 'French', 'Französisch', 'Franzoesisch'),
    ('rus', 'Russian', 'Russisch'),
]


def get_all_subfolders(rootdir):
    subfolders = []
    for it in os.scandir(rootdir):
        if it.is_dir():
            subfolders.append(it.path)
            subfolders.extend(get_all_subfolders(it))
    return subfolders


def get_all_files(folder):
    mkv_files = []
    for file in os.listdir(folder):
        if file.endswith(".mkv"):
            mkv_files.append(os.path.join(folder, file))
    return mkv_files


def subtitle_name_is_forced(text):
    forced_terms = ["forced", "sign", "song", "s&s"]
    if not text:
        return False
    return any([(term in text.lower()) for term in forced_terms])


def subtitle_name_is_language(text, language_key_words: tuple):
    eng_terms = [word.lower() for word in language_key_words]
    if not text:
        return False
    return any([(term in text.lower()) for term in eng_terms])


def update_mkv_file(mkv_file, props: dict, property_to_be_set: str):
    subprocess.Popen(
        ["mkvpropedit", mkv_file, "--edit", "track:=" + str(props["uid"]), "--set",
         property_to_be_set], stdout=subprocess.DEVNULL).wait()


def repair_forced_subtitles(mkv_file, track_name: str, props: dict) -> str:
    report = ""
    if subtitle_name_is_forced(track_name):
        if not props.get("forced_track"):
            report += f"\n- Track \"{track_name}\" set to forced"
            update_mkv_file(mkv_file, props, "flag-forced=true")
        else:
            report += f"\n- Track \"{track_name}\" was already set to forced"
    else:
        report += f"\n- Track \"{track_name}\" is not forced"
    return report


def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report


def analyse_and_correct_mkv(mkv_file: str):
    metadata = json.loads(subprocess.check_output(
        ["mkvmerge", "-J", mkv_file]).decode())
    orig_report = f"File: \"{os.path.basename(mkv_file)}\" had the following changes applied:"
    report = orig_report
    for track in metadata["tracks"]:
        props = track["properties"]
        track_name = props.get("track_name") if props.get(
            "track_name") else "Unnamed"
        if track["type"] == "subtitles":
            report += repair_forced_subtitles(mkv_file, track_name, props)
            for lang_tuple in languages:
                report += repair_language_subtitles(mkv_file, track_name, props, lang_tuple)
    if report == orig_report:
        return f"File: \"{os.path.basename(mkv_file)}\" was not modified"
    else:
        return report


def main(argv):
    folder_list = get_all_subfolders(argv[1])
    folder_list.append(argv[1])
    mkv_files = [
        file for sublist in folder_list for file in get_all_files(sublist)]
    with futures.ThreadPoolExecutor() as ex:
        for future in ex.map(analyse_and_correct_mkv, mkv_files):
            print(future)


if __name__ == "__main__":
    main(sys.argv)

Maybe this is useful to someone. Be aware, that this script only works if the subtitle track names have any of the following keywords in them ["forced", "sign", "song", "s&s"].

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

One way of handling this would be to treat subtitles as forced if they either have the "forced"-flag set to true OR have a substring match for user-provided keywords. The keyword match could then be treated as a quasi forced flag.

But I just noticed that this issue is actually about another bug. Here the issue revolves around the logic picking the closest match in a show that might not have forced subtitles for every episode and picks the full-sub for those that have no forced sub.

@Ninelpienel
Your issue is more about the way it decides what it picks. In your screenshots, I'd guess, it's the episode numbers in the names throwing the logic off.

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

The script I posted above wouldn't remux the files. It would just rewrite the tag information which doesn't cause the entire file to be rewritten. Also, I made it so it runs multi-threaded which should make it run rather quickly.

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

The issue that you had should be resolved when I resolve this issue (#53), But if you want the option of choosing a specific track number (knowing that it may not always be the same track) then please raise a separate feature request and I can take a look at that in time.

This seems to be an issue with a lot of uploads, especially anime. MKV supports the "forced"-flag, but many uploaders don't set it correctly or not at all, so you end up with subtitle tracks that have the keyword "forced" in the name but lack the flag. I made a python script that updates mkv files with the correct flags which can be run like so:

It needs MKVToolNix installed which in turn has to be added to the PATH otherwise the script can't find "mkvpropedit.exe" py script-name.py path/to/files/to/update

#!/usr/bin/env python3
import concurrent.futures as futures
import json
import os
import subprocess
import sys

languages = [
    ('ger', 'German', 'Deutsch'),
    ('eng', 'English', 'Englisch'),
    ('jpn', 'Japanese', 'Japanisch'),
    ('fre', 'French', 'Französisch', 'Franzoesisch'),
    ('rus', 'Russian', 'Russisch'),
]


def get_all_subfolders(rootdir):
    subfolders = []
    for it in os.scandir(rootdir):
        if it.is_dir():
            subfolders.append(it.path)
            subfolders.extend(get_all_subfolders(it))
    return subfolders


def get_all_files(folder):
    mkv_files = []
    for file in os.listdir(folder):
        if file.endswith(".mkv"):
            mkv_files.append(os.path.join(folder, file))
    return mkv_files


def subtitle_name_is_forced(text):
    forced_terms = ["forced", "sign", "song", "s&s"]
    if not text:
        return False
    return any([(term in text.lower()) for term in forced_terms])


def subtitle_name_is_language(text, language_key_words: tuple):
    eng_terms = [word.lower() for word in language_key_words]
    if not text:
        return False
    return any([(term in text.lower()) for term in eng_terms])


def update_mkv_file(mkv_file, props: dict, property_to_be_set: str):
    subprocess.Popen(
        ["mkvpropedit", mkv_file, "--edit", "track:=" + str(props["uid"]), "--set",
         property_to_be_set], stdout=subprocess.DEVNULL).wait()


def repair_forced_subtitles(mkv_file, track_name: str, props: dict) -> str:
    report = ""
    if subtitle_name_is_forced(track_name):
        if not props.get("forced_track"):
            report += f"\n- Track \"{track_name}\" set to forced"
            update_mkv_file(mkv_file, props, "flag-forced=true")
        else:
            report += f"\n- Track \"{track_name}\" was already set to forced"
    else:
        report += f"\n- Track \"{track_name}\" is not forced"
    return report


def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report


def analyse_and_correct_mkv(mkv_file: str):
    metadata = json.loads(subprocess.check_output(
        ["mkvmerge", "-J", mkv_file]).decode())
    orig_report = f"File: \"{os.path.basename(mkv_file)}\" had the following changes applied:"
    report = orig_report
    for track in metadata["tracks"]:
        props = track["properties"]
        track_name = props.get("track_name") if props.get(
            "track_name") else "Unnamed"
        if track["type"] == "subtitles":
            report += repair_forced_subtitles(mkv_file, track_name, props)
            for lang_tuple in languages:
                report += repair_language_subtitles(mkv_file, track_name, props, lang_tuple)
    if report == orig_report:
        return f"File: \"{os.path.basename(mkv_file)}\" was not modified"
    else:
        return report


def main(argv):
    folder_list = get_all_subfolders(argv[1])
    folder_list.append(argv[1])
    mkv_files = [
        file for sublist in folder_list for file in get_all_files(sublist)]
    with futures.ThreadPoolExecutor() as ex:
        for future in ex.map(analyse_and_correct_mkv, mkv_files):
            print(future)


if __name__ == "__main__":
    main(sys.argv)

Maybe this is useful to someone. Be aware, that this script only works if the subtitle track names have any of the following keywords in them ["forced", "sign", "song", "s&s"].

There is small bug: The script interprets the anime name "Tengen Toppa Gurren Lagann" as english an sets the language to "eng". Maybe you can tweek the script a little bit.

Yea, I gave it the whole tuple ('eng', 'English', 'Englisch') which falsely flags the eng in "Tengen" . Should have taken all the elements from the tuple except the first one. Like this:

def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report
>>> ('ger', 'German', 'Deutsch')
def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language[1:]):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report
>>> ('German', 'Deutsch')