marouane53/transcribe

TypeError: Config.__init__() got an unexpected keyword argument 'urls_or_paths'

Opened this issue · 1 comments

Description:
I encountered a TypeError while trying to run the transcription script. The error message indicated that the Config class's constructor does not accept the argument urls_or_paths.

Solution:
The issue arises from a change in the tafrigh library's Config class structure. The constructor now requires an instance of Config.Input, Config.Whisper, Config.Wit, and Config.Output instead of taking arguments directly.

To fix this, modify the transcribe_file function as follows:

`def transcribe_file(file_path, language_sign):
if not is_wav_file(file_path):
print(f"Skipping file {file_path} as it is not in WAV format.")
return

wit_api_key = LANGUAGE_API_KEYS.get(language_sign.upper())
if not wit_api_key:
    print(f"API key not found for language: {language_sign}")
    return

input_config = Config.Input(
    urls_or_paths=[str(file_path)],
    skip_if_output_exist=False,
    playlist_items="",
    download_retries=3,  # Adjust this value as needed
    verbose=False,
)

whisper_config = Config.Whisper(
    model_name_or_path="",  # Specify the model if necessary
    task="",  # Specify the task type if necessary
    language="",  # Specify the language if necessary
    use_faster_whisper=False,
    beam_size=0,
    ct2_compute_type="",
)

wit_config = Config.Wit(
    wit_client_access_tokens=[wit_api_key],
    max_cutting_duration=5,
)

output_config = Config.Output(
    min_words_per_segment=1,
    save_files_before_compact=False,
    save_yt_dlp_responses=False,
    output_sample=0,
    output_formats=[TranscriptType.TXT, TranscriptType.SRT],
    output_dir=str(file_path.parent),
)

config = Config(input=input_config, whisper=whisper_config, wit=wit_config, output=output_config)

print(f"Transcribing file: {file_path}")
progress = deque(farrigh(config), maxlen=0)
print(f"Transcription completed. Check the output directory for the generated files.")

`

After applying these changes, the script should work correctly. Thank you!

he Tafrigh library's configuration structure has changed. Previously, all settings were defined directly within the Config class, but now they are categorized into separate sections like Config.Input, Config.Whisper, Config.Wit, and Config.Output. You have to adjust the old implementation to fit this new structure while keeping my original settings intact.

Modify you code and replace the Config implementation with this :

config = Config(
    input=Config.Input(
        urls_or_paths=[str(file_path)],
        skip_if_output_exist=False,
        download_retries=3,  # Default value, can be adjusted
        yt_dlp_options="{}",  # Keep it as an empty dictionary unless specific options are needed
        verbose=False,
    ),
    whisper=Config.Whisper(
        model_name_or_path="",
        task="",
        language="",
        use_faster_whisper=False,
        beam_size=0,
        ct2_compute_type="",
    ),
    wit=Config.Wit(
        wit_client_access_tokens=[wit_api_key],  # Preserve the API key usage
        max_cutting_duration=15,  # Keep the old value from your implementation
    ),
    output=Config.Output(
        min_words_per_segment=1,  # Preserve the original setting
        save_files_before_compact=False,
        save_yt_dlp_responses=False,
        output_sample=0,
        output_formats=["txt", "srt"],  # Use string values as per new structure
        output_dir=str(file_path.parent),
    ),
	)