Please provide a R module
Closed this issue · 4 comments
Hello,
as far as I understand, if you want to import Pajé data in R, the recommended way is to dump a CSV with pajeng, and grep the result to see only the parts you're interested in. At least, that's what I wrote in the relevant SimGrid tutorial. We have something like
pj_dump --ignore-incomplete-links simgrid.trace | grep STATE > gantt.csv
and then:
library(ggplot2)
# Load and relabel the data
df = read.csv("gantt.csv", header=F, strip.white=T)
names(df) = c("Type", "Actor", "Container", "Start", "End", "Duration", "Level", "State");
# Actually draw the graph
p = ggplot(df) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);
I think that this could be greatly improved if we could have a R module in charge of doing it. The above could be rewritten into
library(pajeng)
library(ggplot2)
df = pajeng.read("LINKS", "gantt.csv")
names(df) = c("Type", "Actor", "Container", "Start", "End", "Duration", "Level", "State");
p = ggplot(df) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);
plot(p)
dev.off()
the pajeng.read function would be in charge of calling pajeng (either on command line in /tmp or directly the C++ functions in memory), select only the elements matching its first parameter, and loading the resulting data in R.
I'm not sure if it's possible or if it's too simgrid-oriented, but I'd love to simplify it further into a pajeng.read_links("gantt.csv")
that would both grep the right elements, and rename the data rows appropriately.
Thanks in advance for your help in simplifying our visualizations.
Mt
I'd go a bit further by encapsulating the whole pajeng framework (the pj_dump tool) inside the R package (we have some experience by doing that with starvz and phenovisr). That way, you could simply do something like this:
data <- pajeng.read("simgrid.trace", ignore.incomplete.links = TRUE);
p <- ggplot(data$State) + geom_segment(aes(x=Start, xend=End, y=Actor, yend=Actor,color=State), size=5);
plot(p)
dev.off()
With such integration using Rcpp
we can get rid of the intermediate CSV
. I am not fully aware of the shortcomings, but I'll take a look in the near future.
It would be just perfect!!
Hi @mquinson, I've finished a first stable version of the pajengr
package, available here.
Creating a space/time view is a matter of:
library(pajengr)
suppressMessages(library(tidyverse))
pajeng_read("traces/simgrid.trace")$state %>%
ggplot(aes(x=Start, xend=End, y=factor(Container),yend=factor(Container), color=Value)) +
theme_bw(base_size=18) +
geom_segment(size=10)
We might have issues with large files, but this is already a start. If you think the current state is already sufficient, you can close this issue. Otherwise, suggestions are welcome.
For small input files (up to dozens of megabytes), this should be okay.