ruuda/claxon

A blueprint for seeking feature

TonalidadeHidrica opened this issue · 6 comments

I'm planning to add seeking feature to this library, as I mentioned in #28 . This issue has been opened for tracking the design plan for this new feature.

The methods to add

First, we are going to add new constructors new_seekable and new_seekable_ext in impl <R: Read + Seek> FlacReader {. The purpose of creating a new struct is that we have to do an additional process right after reading all the metadata blocks: save the position of the reader. This enables the reader to determine the range for binary searching for every seek. This position is going to be stored in a new field of FlacReader, audio_block_start_position: Option<u64>.

Next, we are going to add new function seek to FrameReader and FlacSamples, both of which takes the sample number (0-indexed).

  • FrameReader::seek seeks the reader to the beginning of the desired frame. Here, the "desired frame" is the last frame whose frame number is less than or equal to the given sample number (in case of fixed block size stream, it is automatically converted appropriately). Therefore, the next frame (this is equivalent to claxon::frame::Block, right?) yielded from FrameReader will include the desired sample (if the given sample number was less than the range of the length of the audio stream).
    • By default, this function uses binary search, assuming that any frame header is not broken (contains correct sample/frame number).
    • The range of search is between the beginning of audio frame (that was obtained in the constructor described above) and the end of the stream std::io::SeekFrom::End(0). However, if possible, this function uses SEEKTABLE metadata to narrow down the range, and thus speed up the seeking process.
      • By the way, I found that claxon::metadata::SeekTable does not expose any API to access the data. Shall I add some as well?
      • Also, I think that the data should be stored with something like BTreeMap rather than Vec.
    • For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.
    • In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching. This is in order to avoid searching frame header for almost the same place in the last phase of binary search (it is hard to explain with words; it's rather better to write some diagrams, but I'm reluctant to draw that. Tell me if you want more detailed explanation).
  • FlacSamples::seek seeks the reader to the desired sample. That is, the next sample yielded from the iterator will precisely has the sample index that equal to the provided argument. Internally, it calls FrameReader::seek and then discards the first few frames in the first block obtained.

Concerns and future tasks

  • As I explained above, audio_block_start_position will be stored as Option<u64>, as it may be vacant for non-seekable reader while filled for seekable one. But this is independent of whether the reader is actually seekable or not. That is, one can use the original constructor FrameReader::new while providing seekable reader, and then call seek function. So, the field may be None for the first time seek function is called, in which case we have to go back to the beginning of the stream and actually fill it with appropriate value. This may not be a big problem, but something that can be avoided by "type puzzle": For example, we can define a marker trait IsSeekable, create two marker structs Seekable and NonSeekable, accept S: IsSeekable as additional type parameter for FlacReader, FrameReader and FlacSamples, and define associated type IsSeekable::AudioBlockStartPosition which is set to () for NonSeekable and u64 for Seekable. Whew. Personally I love this kind of type puzzle but I doubt it's suitable for public library. Alternatively, we can define seekable variants for FlacReader, FrameReader and FlacSamples, which may make API less understandable and make code confusing.
  • As a future task, we can provide another function that scans entire audio stream and build "complete" seek table. This allows us to avoid any binary searching, because we can jump to desired frame. However it's not a necessary feature, so I'll leave it laters and not implement it at first.

Thanks for reading this lengthy draft! Feel free to make questions or oppositions. I'll get started in a few days anyway.

ruuda commented

Thanks for writing this up!

we are going to add new constructors new_seekable and new_seekable_ext

I think in hindsight, adding these new_ext constructors, and parsing so much in the constructor, was a mistake. It forces reading blocks that the caller may not be interested in, and the only way to expose them is by storing them in FlacReader, so they need to be allocated on the heap. The FlacReaderOptions are not flexible enough to accomodate all use cases. The design that I am now gravitating towards is to expose two things:

  • A lower-level API for reading raw metadata blocks or frames. It is possible to mis-use, but it also allows parsing only the interesting blocks, and they can be read directly from the underlying reader (which is important for e.g. picture blocks).
  • A higher-level API on top, that combines these building blocks to offer convenient handling of the common cases.

I am working on this in the metadata3 branch. My feeling is that seeking should follow the same approach: expose a lower-level function for reading a block (essentially FrameReader::read_next_or_eof, there is no real need for the FrameReader struct). Then seeking could be another one of those lower-level functions that takes a reader, a sample number, and optionally a SeekTable, and it positions the reader right at the point where FrameReader::read_next_or_eof would read the frame that contains the desired frame number. An additional advantage of this, is that only this function needs an io::Seek constraint on the reader.

Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)

Yes, the flac format refers to blocks of raw audio data as blocks, and blocks encoded in the bitstream as frames.

The range of search is between the beginning of audio frame (that was obtained in the constructor described above) and the end of the stream std::io::SeekFrom::End(0). However, if possible, this function uses SEEKTABLE metadata to narrow down the range, and thus speed up the seeking process.

Yes, that sounds good!

By the way, I found that claxon::metadata::SeekTable does not expose any API to access the data. Shall I add some as well?

It’s not yet exposed indeed, parsing isn’t even implemented yet, so I would start with that first.

Also, I think that the data should be stored with something like BTreeMap rather than Vec.

Depending on the size of the seektable, it may be faster to have a vec with binary search or even linear search, but for now I would go with whatever makes the implementation simpler.

For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.

Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.

In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.

That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.

FlacSamples::seek seeks the reader to the desired sample.

That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in FlacSamples is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.

Thanks for your detailed review.

I understood that adding two more constructors are not a good way. So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

So, the low-level seeking function will be like this. Does it seem good?

/// Seek the given reader to an appropriate position so that
/// the next frame returned by `FrameReader::read_next_or_eof`
/// will contain the sample with the given index, if exists.
fn seek_reader_by_sample_number<R: io::Read + io::Seek>(
  reader: &mut R,
  sample_number: u64,
  audio_block_start_position: Option<u64>,  // Maybe it should not be Option?
  seek_table: Option<SeekTable>,
) -> Option<u64> {
  // ...
}

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

Therefore, the next frame (this is equivalent to claxon::frame::Block, right?)

Yes, the flac format refers to blocks of raw audio data as blocks, and blocks encoded in the bitstream as frames.

Thanks for teaching me. I've totally confused them.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

For every step of binary searching, we find the first occurrence of FRAME_HEADER, by matching both sync code and 8-bit CRC provided at last. Calculating CRC-16 for entire frame makes it even more robust, but I think it is excessive work, and there are very low possibility that CRC-8 accidentally matches.

Yeah just the header should be sufficient, the flac format takes care to avoid creating the frame header bit pattern in other places, and if the CRC of the header matches and the sample number makes sense, then that should be fine. I expect we need to verify that the sample numbers are increasing as the position in the stream increases, otherwise it is probably possible to create an adversarial input that breaks the binary search.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

In the course of seeking, when the search range has been narrowed down enough, we switch to linear searching.

That may be a good future optimization, but I would start with the simplest option possible. But we need some kind of forward search anyway, right? If binary search tells you an offset in the stream, but that offset is not the start of a frame header, then we need to search ahead (or backwards) until there is some frame, otherwise the binary search can’t continue.

Um, it is my fault that I didn't explain clearly, but this is a necessary step for binary seeking, not an optimization.
Simply put, binary search consisting of repetition of the following recursive step:

  • Given are the start and end position in the byte stream, in which we know the frame belongs in that slice.
    • If the length of the slice is large enough, we seek to the midpoint of those two positions, and search forward until the first frame header is found. Then we compare the header's sample index with the desired one, narrow down the range to the left or the right half, and perform the same step recursively.
    • If it's short enough, we simply iterate all the frame headers that belongs in the given span, and determine the suitable frame position.

I've already implemented the similar algorithm for the lewton crate here, so take a look at it if you are interested in.

FlacSamples::seek seeks the reader to the desired sample.

That could be a second step, I would start with searching for the frame; I’m not sure how useful seeking in FlacSamples is, because it is more of a toy API anyway. It is very convenient for just opening a file and getting the audio data, but anything that cares about performance, or wants to do some more advanced things (like seeking), would likely read the frames manually anyway.

At least, I want seekable FlacSamples for easy adoption to my project :P

ruuda commented

So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

Yes, that would be a good first step either way.

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

I intend for metadata3 to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out for BufRead. I expect it can be resolved, and basing seeking off of metadata3 would be easier, but if you prefer to base it off of master, I can rebase metadata3 later; I’ll leave it up to you.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.

Simply put, binary search consisting of repetition of the following recursive step

Yes, I think we are on the same page here 👍

At least, I want seekable FlacSamples for easy adoption to my project :P

Out of curiosity, can you tell a bit more about your use case?

So, as I understand, the high-level API FlacReader should do basic things, and advanced features like seeking should be provided only via low-level API, at least for now, right?

Yes, that would be a good first step either way.

Do I have to implement new seeking feature on metadata3 branch? Or should I wait until the branch is merged? What is the tasks left for that branch?

I intend for metadata3 to be merged into master, but it is currently blocked on performance regressions caused by dropping the custom buffered reader and switching it out for BufRead. I expect it can be resolved, and basing seeking off of metadata3 would be easier, but if you prefer to base it off of master, I can rebase metadata3 later; I’ll leave it up to you.

Regarding with the seektable, are you planning to implement parsing and storing it in metadata3 branch? Or can I do it instead?

I have no plans to implement it, you can go ahead :) This would be a welcome addition in a separate pull request either way.

Well actually, when it comes to an adversarial input, I guess it is possible to provide a fake header where sample numbers "seem" to be increasing but still results in breaking binary search. So I think we can leave it as future tasks as well.

If there are still adversarial cases, it should be possible to find them with a fuzzer. There are already a few fuzzers, maybe we can add one where the fuzz input is first a 32-bit offset to seek to, and the remainder of the input is the flac file. The implementation of the fuzzer would be to just do the seek. If there is an infinite loop, libFuzzer will report it as a hang. But a good first step would be to implement seek indeed, we can add the fuzzer later.

Simply put, binary search consisting of repetition of the following recursive step

Yes, I think we are on the same page here 👍

At least, I want seekable FlacSamples for easy adoption to my project :P

Out of curiosity, can you tell a bit more about your use case?

Well, FlacSamples is an easy iterator that iterates over samples, so it is easy to adopt to iterator-based audio signal processor like rodio. If we don't have FlacSamples::seek, we have to use FrameReader and then skip appropriate samples to achieve precise seek to a specific sample, so implementing it will reduce the effort for user, I think.

I find myself looking for seek functionality in a flac decoder - did you ever make any progress on this, @TonalidadeHidrica ?

@joshhansen Unfortunately I didn't, and if you would try implement this, I'd be very happy!