maciejhirsz/logos

Discussion: how to implement `Source` for lazy readers, like `T: Read` or `T: BufRead`

jeertmans opened this issue · 2 comments

Hello all,

I'd like to open this discussion because, to me, it would be fascinating that Logos supports Source types others than str and [u8], especially lazy readers like those who implement Read or BufRead.

impl<T: Read> Source <T>, or impl<T: Read + Seek> Source <T> would be a game changer to me, as it would allow to lex some string without needing to allocate it completely.

I have tried a bit of different implementation, but I already see some shortcomings that need to be addressed or discussed:

  • Source::len() -> usize should maybe be Source::len_hint() -> Option<usize>
  • Source::read_* methods should take mutable reference to the reader (but it maybe will reduce performances for types like str or [u8] that do not need mutability).
  • unsafe methods do not really make sense here, so I don't know how to deal with them (except by copying and pasting the safe equivalent).
  • reading with offset position may not be good, especially since this may require using Seek::seek. If backtracking is never allowed, then using only read methods should be fine, no?
  • Tokens take a reference from the original source, so I don't know if implementing for Read is enough, because we may be loosing all reference to the original source. Implementing Source for Bytes may be a solution.

My question is then: did anyone already think about this problem? Has anyone some ideas or suggestions?

Mek101 commented
  1. Shouldn't be too much of a blocker. Even when reading from disk a simple fstat or equivalent shouldn't be too expensive, especially if the result is cached.

The changes required to make logos itself accept a mutable source are non-trivial, but this might be of interest to you if you're just looking for a way to leverage a logos lexer with a Read/BufRead:

https://github.com/cliffeh/logos-genawaiter/blob/main/src/main.rs

It's not exactly efficient and in its current form only works with line-wise input, but it should give some idea of the possibilities.