vegawidget/virgo

data transformation

sa-lee opened this issue · 8 comments

How to provide sugar / dplyr like interface for the vega-lite transformation API
https://observablehq.com/@uwdata/data-transformation?collection=@uwdata/visualization-curriculum

Just add one more example here https://bl.ocks.org/amitkaps/a484b94a7e1e0705c5ec865ba31f463c for later discussion.

transform filter takes more types of predicates than datum.

thinking out loud but i think if we want to scale visualisations we would probably need to go to another drawing library. ala https://github.com/visgl/deck.gl

Vega transform -> dplyr verbs

  • calculate -> mutate()
  • filter -> filter(), vega() %>% filter(selection) (now it's clear that selection corresponds to a vector of logical)
  • sample -> sample_n()
  • lookup -> left_join()
  • join aggregate -> TBD
  • window -> prefixed with vg_*() for vector functions like vg_mean()

For layer-specific transformation, use data pronoun .vega as data input, .vega %>% filter(selection). By default, transform arg in vega_layer() uses filter() when the input is selection.

timeline <- select_interval("x")
p_avg <- akl_weather %>%
  vega(enc(x = vg_month(date)), width = 600, height = 350) %>%
  # filter(timeline) %>%
  mark_ribbon(
    enc(y = vg_mean(tmin), y2 = vg_mean(tmax)),
    interpolate = "monotone",
    colour = "#fc9272", opacity = 0.3,
    transform = timeline) %>% # filter(.vega, timeline)
  mark_line(enc(y = vg_mean(prcp)), colour = "#3182bd",
    transform = timeline) %>%
  mark_point(enc(y = vg_mean(prcp)), colour = "#3182bd",
    transform = timeline) %>%
  resolve_views(scale = list(y = "independent"))

I like the dplyr idea.

Also a quick note on performance - it looks altair caps data rows to 5000? https://altair-viz.github.io/user_guide/data_transformers.html

I've been developing {dplyr} verbs, and I came to a point where I don't see much of use to transform static data in the interactive settings. It's useful in transforming selected data, and the current development has already handled the functionality. We'll discuss this more soon today.

Drop {dplyr} development for transforming

I'm going to reopen this issue, since I could see more use cases for transforming data. But instead of dispatching on the static data, we could define {dplyr} verbs for a selection object.

Useful with mutate(<selection>) in particular.