Improve adapters returned interface
alexandercerutti opened this issue · 2 comments
As per the first version of sub37, the whole architecture expects Adapters always to return a ParseResult
, containing data and perhaps some ParseError
.
The idea was to have ParseResult
and ParseError
exposed by the BaseAdapter
so that every adapter could use them without importing anything else and without making @sub37/server
export other material.
However, the current situation reveals itself to be somehow inconsistent:
-
BaseAdapter/index.ts
exportsBaseAdapter
(default),ParseResult
, andParseError
classes. We should technically moveParse*
to a different file; -
BaseAdapter
exposesParseResult
as an instance property, but notParseError
; -
Related to the points above, tests and WebVTT adapter use
ParseResult
from there, whileBaseAdapter
should have been the entry point to access them.
This happens becauseBaseAdapter.ParseResult
is actually a method that returns anew ParseResult
and, therefore it cannot be used as right-hand in aninstanceof
comparison: we must useinstanceof ParseResult
, but we should useinstanceof BaseAdapter.ParseResult
; -
BaseAdapter
exposes bothParseResult
andParseError
types by using a namespace; -
ParseError
class is exposed fromBaseAdapter/index.ts
module, but it is not used as a class (this happens due to the point 3, so it is an useless class.
Fixing the whole situation is quite breaking, so should be fixed in the next major.
Giving a look on this over a year later, and while working on the TTML Adapter, made me think that adapters could virtually take a lot of time to parse cues and return all the Cues.
I'm working on a Macbook Pro M1 machine, which is quite fast using Chrome (a TTML track of The Office 1x1 from Netflix, which is quite verbose, gets compiled within 130-180ms).
I acknowledge this "benchmark" is not trustable at all and, in fact, testing on some Intel machines highlighted that sync parsing could take over 500ms, which becomes quite critical as the browsers main thread is blocked for such time.
A Safari on iOS seems to take less than 50ms, which is incredible to me.
As the goal of this package is to run also on televisions (CTVs), I expect way higher parsing times on such low-end devices (yeah, they have very limited hardware).
However, as per today, sub37 doesn't own a system to change this. An early idea was to allow async tracks. Still, postponing the parsing either on microtasks or "macrotasks" (timer), doesn't potentially prevent blocking the main thread.
In order to improve such aspect, I am of the idea that a "parsing job" should be chunked and streamed.
This means introducing a concept of "resumability" of the work (weather it is stateless or stateful on the adapters side).
This tecnically could be achieved by using yield
ing and generators, but as CTVs browsers integration (per my experience) might still run on an incomplete ES6 integrations or full ES5, yield
is not an option.
Even generators polyfilling could be a not-so-good choice, as this might reduce performace (tests to be done), according to some comments in Shaka-Player (I read it once, I don't remember where it is).
So, creating a streaming system would mean that the returned structure should allow @sub37/server
to subscribe, like an observable, to such structure in order to receive new cues without considering time as an important factor.
In order to create job chunks, we should measure how many milliseconds it takes a cue to get fully parsed, to establish then how many cues can we ship in a chunk and, hence, how many cues can we parse within a pre-established amount of time.
This "pre-established" could be perhaps the 16ms of an animation or whatever time we want. We could even say "100ms" (completely random value, but within 100ms, user see things as immediate).
This chunking and streaming system could also partially open us to the streaming of track data over the internet for live streaming videos, for which the text is live generated, for which today there are some limitations in terms of requirements (adding new chunks require them to still have a header, which is a non-sense thing to me - and I created it).
Another limitation of the current interface, could be the impossibility to emit document-related details.
Each track (in particular TTML's but also WebVTT could have one) has what is called "document", which is a relative representation of the root of everything contained in it.
A thing that never happened until now, while developing TTML adapter, is that we could have some "global" attributes belonging to the document and that could not be strictly be assigned to single Cues (which is the case of "global styles" for WebVTT).
An example case is the tts:extent
, which defines the width and the height of the rendering area that, in our case, should go to change how the renderer behaves.
Future subtitles formats could have something like this, so it could be the case to let @sub37/server
to handle such different format of data.