adracea/rsubs-lib

WebVTT does not parse in rsubs-lib 0.3.1 but did in 0.1.9

rbozan opened this issue · 5 comments

https://gist.github.com/samdutton/ca37f3adaf4e23679957b8083e061177

pub fn parse_vtt(content: String) -> anyhow::Result<Vec<VTTLine>> {
    let result = VTT::parse(content)?.lines;
    Ok(result)
}

#[cfg(test)]
mod tests {
    use super::parse_vtt;

    #[tokio::test]
    async fn it_parses_vtt() {
        let subs = parse_vtt(include_str!("../fixtures/test.vtt").to_string());

        insta::assert_debug_snapshot!(subs);
    }

}

0.1.9:

Ok(
    [
        VTTLine {
            line_number: "",
            style: Some(
                "Default",
            ),
            line_start: Time {
                h: 0,
                m: 0,
                s: 2,
                ms: 500,
                frames: 0,
                fps: 0.0,
            },
            line_end: Time {
                h: 0,
                m: 0,
                s: 4,
                ms: 300,
                frames: 0,
                fps: 0.0,
            },
            position: Some(
                VTTPos {
                    pos: 0,
                    pos_align: None,
                    size: 0,
                    line: 0,
                    line_align: None,
                    align: "center",
                },
            ),
            line_text: "and the way we access it is changing\\N",
        },
    ],
)

0.3.1:

          0 │+Err(
          1 │+    VTTError {
          2 │+        line: 1,
          3 │+        kind: InvalidFormat,
          4 │+    },
 2313     5 │ )

The formatting probably crewed up when you copy-pasted the lines from the gist (at least that's what happened to me). After the WEBVTT header MUST be a newline, else the file is invalid (according to the spec; since 0.3.0 this library only parses successfully if the file is 100% spec compatible).

Invalid:

WEBVTT
00:00:00.500 --> 00:00:02.000
The Web is always changing

00:00:02.500 --> 00:00:04.300
and the way we access it is changing

Valid:

WEBVTT

00:00:00.500 --> 00:00:02.000
The Web is always changing

00:00:02.500 --> 00:00:04.300
and the way we access it is changing

Hm I guess you are right about that as I retried and it worked, but my original subtitles still do not seem to work. I'm not sure if it's because of this library or because their format is not following the spec. But I suppose they are not following the spec if you are telling me this library is now 100% spec compliant.

https://invidious.fdn.fr/api/v1/captions/Tnpe6aoJOA0?label=English

Their header is like this:

WEBVTT
Kind: captions
Language: en

00:00:00.000 --> 00:00:05.820
My friends are from Morocco, are you from Morocco too? 
- Morocco, of course. Yes.
Look here, how interesting, it says

00:00:05.820 --> 00:00:11.700
 "House of Pakistan".
It's truly being in one territory but feeling entirely in another.  

No, this is a valid concern regarding the comments section of the spec. I believe I might have been ignoring them previously. But this is indeed something that I don't think is currently being accounted for.

From the specs: https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#examples_2

if !line.starts_with("WEBVTT") || block_lines.next().is_some() {

I believe the issue is there at a first glance, we could probably just remove the || ... stuff @bytedream , right?

if !line.starts_with("WEBVTT") || block_lines.next().is_some() {

I believe the issue is there at a first glance, we could probably just remove the || ... stuff @bytedream , right?

Yes