rajasekarv/vega

questions about wordcount example

Opened this issue · 1 comments

I write a WordCount example with your framework as follows. It only processes a 17-lines text but takes 240s to finish on my computer. Why does it run so slow?

use chrono::prelude::*;
use vega::io::*;
use vega::*;
use std::fs::File;

fn main() -> Result<()> {
    let context = Context::new()?;

    let num_splits = 4;
    let deserializer = Fn!(|file: Vec<u8>| {
        String::from_utf8(file)
        .unwrap()
        .lines()
        .map(|s| s.to_string())
        .collect::<Vec<_>>()
    });
    let lines = context
                .read_source(LocalFsReaderConfig::new("./README.md"), deserializer)
                .flat_map(Fn!(|lines: Vec<String>| {
                    Box::new(lines.into_iter()) as Box<dyn Iterator<Item = _>>
                }));
    
    let words = lines.flat_map(Fn!(|line: String| {
        Box::new(line.split(' ').map(|s| (s.to_string(), 1)).collect::<Vec<_>>().into_iter()) as Box<dyn Iterator<Item = _>>
    }));

    let result = words.reduce_by_key(Fn!(|(a, b)| a + b), num_splits);

    let output = result.collect().unwrap();

    println!("result: {:?}", output);

    Ok(())
}

Hello, Sorry for a very late reply. I was taking some break from maintaining the public branch of this library for some time. Hence the delay.

240s doesn't seem correct. Can you provide more details? Maybe you are taking initial compilation time also into account?