A blueprint for seeking feature
TonalidadeHidrica opened this issue · 6 comments
I'm planning to add seeking feature to this library, as I mentioned in #28 . This issue has been opened for tracking the design plan for this new feature.
The methods to add
First, we are going to add new constructors new_seekable
and new_seekable_ext
in impl <R: Read + Seek> FlacReader {
. The purpose of creating a new struct is that we have to do an additional process right after reading all the metadata blocks: save the position of the reader. This enables the reader to determine the range for binary searching for every seek. This position is going to be stored in a new field of FlacReader
, audio_block_start_position: Option<u64>
.
Next, we are going to add new function seek
to FrameReader
and FlacSamples
, both of which takes the sample number (0-indexed).
FrameReader::seek
seeks the reader to the beginning of the desired frame. Here, the "desired frame" is the last frame whose frame number is less than or equal to the given sample number (in case of fixed block size stream, it is automatically converted appropriately). Therefore, the next frame (this is equivalent toclaxon::frame::Block
, right?) yielded fromFrameReader
will include the desired sample (if the given sample number was less than the range of the length of the audio stream).- By default, this function uses binary search, assuming that any frame header is not broken (contains correct sample/frame number).
- The range of search is between the beginning of audio frame (that was obtained in the constructor described above) and the end of the stream
std::io::SeekFrom::End(0)
. However, if possible, this function usesSEEKTABLE
metadata to narrow down the range, and thus speed up the seeking process.- By the way, I found that
claxon::metadata::SeekTable
does not expose any API to access the data. Shall I add some as well? - Also, I think that the data should be stored with something like
BTreeMap
rather thanVec
.
- By the way, I found that
- For every step of binary searching, we find the first occurrence of
FRAME_HEADER
, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches. - In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching. This is in order to avoid searching frame header for almost the same place in the last phase of binary search (it is hard to explain with words; it's rather better to write some diagrams, but I'm reluctant to draw that. Tell me if you want more detailed explanation).
FlacSamples::seek
seeks the reader to the desired sample. That is, the next sample yielded from the iterator will precisely has the sample index that equal to the provided argument. Internally, it callsFrameReader::seek
and then discards the first few frames in the first block obtained.
Concerns and future tasks
- As I explained above,
audio_block_start_position
will be stored asOption<u64>
, as it may be vacant for non-seekable reader while filled for seekable one. But this is independent of whether the reader is actually seekable or not. That is, one can use the original constructorFrameReader::new
while providing seekable reader, and then callseek
function. So, the field may beNone
for the first timeseek
function is called, in which case we have to go back to the beginning of the stream and actually fill it with appropriate value. This may not be a big problem, but something that can be avoided by "type puzzle": For example, we can define a marker traitIsSeekable
, create two marker structsSeekable
andNonSeekable
, acceptS: IsSeekable
as additional type parameter forFlacReader
,FrameReader
andFlacSamples
, and define associated typeIsSeekable::AudioBlockStartPosition
which is set to()
forNonSeekable
andu64
forSeekable
. Whew. Personally I love this kind of type puzzle but I doubt it's suitable for public library. Alternatively, we can define seekable variants forFlacReader
,FrameReader
andFlacSamples
, which may make API less understandable and make code confusing. - As a future task, we can provide another function that scans entire audio stream and build "complete" seek table. This allows us to avoid any binary searching, because we can jump to desired frame. However it's not a necessary feature, so I'll leave it laters and not implement it at first.
Thanks for reading this lengthy draft! Feel free to make questions or oppositions. I'll get started in a few days anyway.
Thanks for writing this up!
we are going to add new constructors
new_seekable
andnew_seekable_ext
I think in hindsight, adding these new_ext
constructors, and parsing so much in the constructor, was a mistake. It forces reading blocks that the caller may not be interested in, and the only way to expose them is by storing them in FlacReader
, so they need to be allocated on the heap. The FlacReaderOptions
are not flexible enough to accomodate all use cases. The design that I am now gravitating towards is to expose two things:
- A lower-level API for reading raw metadata blocks or frames. It is possible to mis-use, but it also allows parsing only the interesting blocks, and they can be read directly from the underlying reader (which is important for e.g. picture blocks).
- A higher-level API on top, that combines these building blocks to offer convenient handling of the common cases.
I am working on this in the metadata3
branch. My feeling is that seeking should follow the same approach: expose a lower-level function for reading a block (essentially FrameReader::read_next_or_eof
, there is no real need for the FrameReader
struct). Then seeking could be another one of those lower-level functions that takes a reader, a sample number, and optionally a SeekTable
, and it positions the reader right at the point where FrameReader::read_next_or_eof
would read the frame that contains the desired frame number. An additional advantage of this, is that only this function needs an io::Seek
constraint on the reader.
Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)
The range of search is between the beginning of audio frame (that was obtained in the constructor described above) and the end of the stream std::io::SeekFrom::End(0). However, if possible, this function uses SEEKTABLE metadata to narrow down the range, and thus speed up the seeking process.
Yes, that sounds good!
By the way, I found that claxon::metadata::SeekTable does not expose any API to access the data. Shall I add some as well?
It’s not yet exposed indeed, parsing isn’t even implemented yet, so I would start with that first.
Also, I think that the data should be stored with something like BTreeMap rather than Vec.
Depending on the size of the seektable, it may be faster to have a vec with binary search or even linear search, but for now I would go with whatever makes the implementation simpler.
For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.
Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.
In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.
That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.
FlacSamples::seek
seeks the reader to the desired sample.
That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in FlacSamples
is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.
Thanks for your detailed review.
I understood that adding two more constructors are not a good way. So, as I understand, the high-level API FlacReader
should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?
So, the low-level seeking function will be like this. Does it seem good?
/// Seek the given reader to an appropriate position so that
/// the next frame returned by `FrameReader::read_next_or_eof`
/// will contain the sample with the given index, if exists.
fn seek_reader_by_sample_number<R: io::Read + io::Seek>(
reader: &mut R,
sample_number: u64,
audio_block_start_position: Option<u64>, // Maybe it should not be Option?
seek_table: Option<SeekTable>,
) -> Option<u64> {
// ...
}
Do I have to implement new seeking feature on metadata3
branch? Or should I wait until the branch is merged? What is the tasks left for that branch?
Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)
Thanks for teaching me. I've totally confused them.
Regarding with the seektable, are you planning to implement parsing and storing it in metadata3
branch? Or can I do it instead?
For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.
Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.
Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.
In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.
That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.
Um, it is my fault that I didn't explain clearly, but this is a necessary step for binary seeking, not an optimization.
Simply put, binary search consisting of repetition of the following recursive step:
- Given are the start and end position in the byte stream, in which we know the frame belongs in that slice.
- If the length of the slice is large enough, we seek to the midpoint of those two positions, and search forward until the first frame header is found. Then we compare the header's sample index with the desired one, narrow down the range to the left or the right half, and perform the same step recursively.
- If it's short enough, we simply iterate all the frame headers that belongs in the given span, and determine the suitable frame position.
I've already implemented the similar algorithm for the lewton
crate here, so take a look at it if you are interested in.
FlacSamples::seek
seeks the reader to the desired sample.That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in
FlacSamples
is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.
At least, I want seekable FlacSamples
for easy adoption to my project :P
So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?
Yes, that would be a good first step either way.
Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?
I intend for metadata3
to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out for BufRead
. I expect it can be resolved, and basing seeking off of metadata3
would be easier, but if you prefer to base it off of master, I can rebase metadata3
later; I’ll leave it up to you.
Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?
I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.
Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.
If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.
Simply put, binary search consisting of repetition of the following recursive step
Yes, I think we are on the same page here 👍
At least, I want seekable FlacSamples for easy adoption to my project :P
Out of curiosity, can you tell a bit more about your use case?
So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?
Yes, that would be a good first step either way.
Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?
I intend for
metadata3
to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out forBufRead
. I expect it can be resolved, and basing seeking off ofmetadata3
would be easier, but if you prefer to base it off of master, I can rebasemetadata3
later; I’ll leave it up to you.Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?
I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.
Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.
If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.
Simply put, binary search consisting of repetition of the following recursive step
Yes, I think we are on the same page here 👍
At least, I want seekable FlacSamples for easy adoption to my project :P
Out of curiosity, can you tell a bit more about your use case?
Well, FlacSamples
is an easy iterator that iterates over samples, so it is easy to adopt to iterator-based audio signal processor like rodio
. If we don't have FlacSamples::seek
, we have to use FrameReader
and then skip appropriate samples to achieve precise seek to a specific sample, so implementing it will reduce the effort for user, I think.
I find myself looking for seek functionality in a flac decoder - did you ever make any progress on this, @TonalidadeHidrica ?
@joshhansen Unfortunately I didn't, and if you would try implement this, I'd be very happy!