Add arguments `time_off` and `duration` to transcriber
me-kell opened this issue · 2 comments
me-kell commented
Currently the transcriber processes the whole input file. From the beginning to the end.
It would be very useful to be able to pass a start time offset and/or a duration to the transcriber.
Here is a proposal how to do it:
Add (ffmpeg's) arguments time_off
and duration
in python/vosk/transcriber/cli.py
after line 46.
parser.add_argument("--time_off", "-ss", default=None, type=int, help="start time offset")
parser.add_argument("--duration", "-d", default=None, type=int, help="duration")
Pass the arguments time_off
and duration
to ffmpeg in function resample_ffmpeg
in python/vosk/transcriber/transcriber.py
(line 115):
cmd = shlex.split("ffmpeg -nostdin -loglevel quiet "
"-i \'{}\' -ar {} -ac 1 {} {} -f s16le -".format(
str(infile),
SAMPLE_RATE,
f'-ss {self.args.time_off}' if self.args.time_off is not None else '', # add this
f'-t {self.args.duration}' if self.args.duration is not None else '' # and this
))
The function resample_ffmpeg_async
could be adapted similarly.
nshmyrev commented
Hi, thank you for the proposal! Looks nice but what is the usecase please? I can't imagine the user needs to start from certain offset instead of just processing the whole file.
me-kell commented
Some use cases:
- Have a recording of an interview and a list of the start times of every question and answer. You may want to assign the transcripted parts to their respective time points (question and answer).
- You have a music radio programm with the radio speaker commenting every two or three songs. You may want to transcribe only the radio speaker but not the music songs.
- And last but not least: you have an audio file with different languages spoken by different speakers. You may want to transcript different parts of the audio in different languages using the corresponding language and model.