schnorr/pajeng

Please provide a R module

Closed this issue · 4 comments

Hello,

as far as I understand, if you want to import Pajé data in R, the recommended way is to dump a CSV with pajeng, and grep the result to see only the parts you're interested in. At least, that's what I wrote in the relevant SimGrid tutorial. We have something like
pj_dump --ignore-incomplete-links simgrid.trace | grep STATE > gantt.csv and then:

library(ggplot2)

# Load and relabel the data
df = read.csv("gantt.csv", header=F, strip.white=T)
names(df) = c("Type", "Actor", "Container", "Start", "End", "Duration", "Level", "State");

# Actually draw the graph
p = ggplot(df) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);

I think that this could be greatly improved if we could have a R module in charge of doing it. The above could be rewritten into

library(pajeng)
library(ggplot2)

df = pajeng.read("LINKS", "gantt.csv") 
names(df) = c("Type", "Actor", "Container", "Start", "End", "Duration", "Level", "State");

p = ggplot(df) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);
plot(p)
dev.off()

the pajeng.read function would be in charge of calling pajeng (either on command line in /tmp or directly the C++ functions in memory), select only the elements matching its first parameter, and loading the resulting data in R.

I'm not sure if it's possible or if it's too simgrid-oriented, but I'd love to simplify it further into a pajeng.read_links("gantt.csv") that would both grep the right elements, and rename the data rows appropriately.

Thanks in advance for your help in simplifying our visualizations.
Mt

I'd go a bit further by encapsulating the whole pajeng framework (the pj_dump tool) inside the R package (we have some experience by doing that with starvz and phenovisr). That way, you could simply do something like this:

data <- pajeng.read("simgrid.trace", ignore.incomplete.links = TRUE);
p <- ggplot(data$State) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);
plot(p)
dev.off()

With such integration using Rcpp we can get rid of the intermediate CSV. I am not fully aware of the shortcomings, but I'll take a look in the near future.

It would be just perfect!!

Hi @mquinson, I've finished a first stable version of the pajengr package, available here.

Creating a space/time view is a matter of:

library(pajengr)
suppressMessages(library(tidyverse))
pajeng_read("traces/simgrid.trace")$state %>%
     ggplot(aes(x=Start, xend=End, y=factor(Container),yend=factor(Container), color=Value)) +
         theme_bw(base_size=18) +
         geom_segment(size=10)

We might have issues with large files, but this is already a start. If you think the current state is already sufficient, you can close this issue. Otherwise, suggestions are welcome.

For small input files (up to dozens of megabytes), this should be okay.