WebVTT does not parse in rsubs-lib 0.3.1 but did in 0.1.9
rbozan opened this issue · 5 comments
https://gist.github.com/samdutton/ca37f3adaf4e23679957b8083e061177
pub fn parse_vtt(content: String) -> anyhow::Result<Vec<VTTLine>> {
let result = VTT::parse(content)?.lines;
Ok(result)
}
#[cfg(test)]
mod tests {
use super::parse_vtt;
#[tokio::test]
async fn it_parses_vtt() {
let subs = parse_vtt(include_str!("../fixtures/test.vtt").to_string());
insta::assert_debug_snapshot!(subs);
}
}
0.1.9:
Ok(
[
VTTLine {
line_number: "",
style: Some(
"Default",
),
line_start: Time {
h: 0,
m: 0,
s: 2,
ms: 500,
frames: 0,
fps: 0.0,
},
line_end: Time {
h: 0,
m: 0,
s: 4,
ms: 300,
frames: 0,
fps: 0.0,
},
position: Some(
VTTPos {
pos: 0,
pos_align: None,
size: 0,
line: 0,
line_align: None,
align: "center",
},
),
line_text: "and the way we access it is changing\\N",
},
],
)
0.3.1:
0 │+Err(
1 │+ VTTError {
2 │+ line: 1,
3 │+ kind: InvalidFormat,
4 │+ },
2313 5 │ )
The formatting probably crewed up when you copy-pasted the lines from the gist (at least that's what happened to me). After the WEBVTT
header MUST be a newline, else the file is invalid (according to the spec; since 0.3.0 this library only parses successfully if the file is 100% spec compatible).
Invalid:
WEBVTT
00:00:00.500 --> 00:00:02.000
The Web is always changing
00:00:02.500 --> 00:00:04.300
and the way we access it is changing
Valid:
WEBVTT
00:00:00.500 --> 00:00:02.000
The Web is always changing
00:00:02.500 --> 00:00:04.300
and the way we access it is changing
Hm I guess you are right about that as I retried and it worked, but my original subtitles still do not seem to work. I'm not sure if it's because of this library or because their format is not following the spec. But I suppose they are not following the spec if you are telling me this library is now 100% spec compliant.
https://invidious.fdn.fr/api/v1/captions/Tnpe6aoJOA0?label=English
Their header is like this:
WEBVTT
Kind: captions
Language: en
00:00:00.000 --> 00:00:05.820
My friends are from Morocco, are you from Morocco too?
- Morocco, of course. Yes.
Look here, how interesting, it says
00:00:05.820 --> 00:00:11.700
"House of Pakistan".
It's truly being in one territory but feeling entirely in another.
No, this is a valid concern regarding the comments
section of the spec. I believe I might have been ignoring them previously. But this is indeed something that I don't think is currently being accounted for.
From the specs: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#examples_2
Line 359 in 9fa58c4
I believe the issue is there at a first glance, we could probably just remove the || ...
stuff @bytedream , right?
Line 359 in 9fa58c4
I believe the issue is there at a first glance, we could probably just remove the
|| ...
stuff @bytedream , right?
Yes