Improve performance by parsing directly into `struct`

Question

Improve performance by parsing directly into `struct`

Boudewijn26 opened this issue a year ago · 1 comments

In working with this package and looking for performance optimisations one of the possible bottlenecks we identified was that queries were first read into a map, be that FluxRecord or GenericMap after which they'd be converted into the supplied struct through the FromMap trait. While at the moment we wouldn't have a need for optimising this last bit of performance, we thought it might be worthwhile to share the thinking we've done on this so far. Both to get the ball rolling should the need for this last bit of performance arise or for others to pick up if they should feel like it.

For simplicity we'll assume the following query and struct:

from(bucket: "bucket")
|> range(start: -1d)
|> keep(columns: ["_time", "_value", "location"])

struct Row {
  value: f64,
  time: DateTime<FixedOffset>,
  location: String
}

The main goal is for the impl FallibleIterator for QueryTableResult to have a generic Item. The way I'd go about it is by defining a

trait RowBuilder<T: ?Sized> {
  fn column_order(&mut self, columns: Vec<&str>);
  fn next_val(&mut self, val: Value);
  fn build_row(&mut self) -> Result<T, RowBuildError>;
}

enum RowBuildError {
    MissingValues,
    IncorrectValue,
    MissingColumn,
}

and adding a

trait Buildable<T: RowBuilder<Self>> where Self: Sized {
  fn builder() -> T;
}

then the trick would be to define the #[derive(Buildable)] to add to our struct Row. struct QueryTableResult would then contain the impl RowBuilder.

An implementation of RowBuilder for our Row would be:

struct Builder {
    value: Option<f64>, // fields in original struct as Options
    time: Option<DateTime<FixedOffset>>,
    location: Option<String>,
    value_index: usize, // index for each field
    time_index: usize,
    location_index: usize,
    index: usize,
}

impl RowBuilder<Row> for Builder {
    fn column_order(&mut self, columns: Vec<&str>) -> Result<(), RowBuildError> {
        let mut value_set = false;
        let mut time_set = false;
        let mut location_set = false;
        for (idx, col) in columns.iter().enumerate() {
            if *col == "_value" || *col == "value" {
                self.value_index = idx;
                value_set = true;
            }
            if *col == "_time" || *col == "time" {
                self.time_index = idx;
                time_set = true;
            }
            if *col == "_location" || *col == "location" {
                self.location_index = idx;
                location_set = true;
            }
        }
        if value_set && time_set && location_set {
            Ok(())
        } else {
            Err(RowBuildError::MissingColumn)
        }
    }

    fn next_val(&mut self, val: Value) -> Result<(), RowBuildError> {
        if self.index == self.value_index {
            match val {
                Value::Double(d) => self.value = Some(d.0),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        } else if self.index == self.time_index {
            match val {
                Value::TimeRFC(t) => self.time = Some(t),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        } else if self.index == self.location_index {
            match val {
                Value::String(s) => self.location = Some(s),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        }
        self.index += 1;
        Ok(())
    }

    fn build_row(&mut self) -> Result<Row, RowBuildError> {
        let result = match (self.value, self.time, self.location) {
            (Some(value), Some(time), Some(location)) => Ok(Row { value, time, location }),
            _ => Err(RowBuildError::MissingValues),
        };
        self.value = None;
        self.time = None;
        self.location = None;
        self.index = 0;
        result
    }
}

While the implementation of Builder isn't the easiest, I think it wouldn't be that difficult to see where it can be generalised in terms of the struct fields.

To maintain compatibility you'd probably want an impl RowBuilder<FluxRecord> as well. I hope you find this approach constructive.

Answer 1 · 2023-09-19T19:41:10.000Z

On second thought this is pretty similar to what csv and serde are already doing. It'd probably be easier to lean on them instead of going about it alone. Though I do need to do some more reading to fully understand how csv and serde interact