bovee/entab

Closure-based reader?

Opened this issue · 1 comments

bovee commented

When playing around with read_into, I wrote a reader that operates more functionally and maybe could be modified into something that could do e.g. multithreaded map-reduce?

/// Apply `fxn` to each element in the data
#[doc(hidden)]
pub fn reduce<'r: 's, 's, E, T, D, F, P, TS, S>(data: D, init: S, params: Option<P>, fxn: F) -> Result<S, E>
where
    D: TryInto<ReadBuffer<'r>>,
    E: From<EtError>,
    F: Fn(S, &T) -> Result<S, E>,
    EtError: From<<D as TryInto<ReadBuffer<'r>>>::Error>,
    T: Default + FromSlice<'s, 's, State = TS>,
    TS: for<'a> FromSlice<'a, 'a, State = P> + 's,
    P: Default,
{
    let mut rb = data.try_into().map_err(|e| e.into())?;
    let mut user_state = init;
    let mut parser_state = match rb.next(&mut params.unwrap_or_default())? {
        Some(state) => state,
        None => {
            return Err(E::from(EtError::new("Could not initialize state {}")));
        },
    };
    let mut record = T::default();
    while unsafe { rb.next_into(&mut parser_state, &mut record)? } {
        user_state = fxn(user_state, &record)?;
    }
    Ok(user_state)
}

#[cfg(test)]
mod tests {
    use super::*;
    use parsers::fastq::FastqRecord;

    #[test]
    fn test_reduce() -> Result<(), EtError> {
        let data: &[u8] = include_bytes!("../tests/data/test.fastq");
        let count: usize = reduce(data, 0, None, |count, &FastqRecord { sequence, .. }| {
            Ok::<_, EtError>(count + sequence.len())
        })?;
        assert_eq!(count, 250000);
        Ok(())
    }
}
bovee commented

Maybe this is linked to #23 in that these closures could be run partially in parallel?