jonhoo/inferno

When generating a flamegraph programmatically the `flamegraph::Options::consistent_palette` option shouldn't require filesystem access

Closed this issue · 11 comments

koute commented

There should be an API which allows the user to enable the consistent_palette option and maintain the palette map themselves in memory without touching the filesystem.

@JordiChauzi this is probably your domain. Any ideas for how we might do this? Maybe the trick is going to be to have the CLI frontend open the file, and then have the Options struct to flamegraph just take a Box<Read>?

koute commented

I haven't looked at the code, but a quick suggestion - passing a String/Vec<u8> to Options and letting the user extract it back somehow would probably be the simplest. (As I assume there also needs to be a way to read this back so that it can be saved somewhere by the user?) Or perhaps create a dedicated opaque PaletteMap struct, and pass that to the Options? (That way inferno won't have to continuously reparse the palette map on every invocation of `from_sorted_lines.)

I think R: Read covers that use-case. Specifically, you could give in something like Cursor<&[u8]> and it should just work. Wrapping that in an opaque PaletteMap struct might be a good idea too to avoid re-parsing. Where it gets a little complicated is that we also want to store the palette map at the end of execution (since we may have added a bunch of entries to it). Perhaps what we should do here is have PaletteMap be a trait, and then impl it for PathBuf along with a map like today, but also let the user provide their own impl. Specifically, it'd need

trait PaletteMap {
    fn lookup(&self, func: &str) -> (u8, u8, u8);
    fn remember(&mut self, func: &str, color: (u8, u8, u8));
    fn save(self);
}
koute commented

I think R: Read covers that use-case. Specifically, you could give in something like Cursor<&[u8]> and it should just work.

But then inferno would have to parse it every time from_sorted_lines gets called, no? With a dedicated struct you could just call PaletteMap::from_bytes or PaletteMap::from_stream to parse it once and create a PaletteMap instance, use it multiple times when calling flamegraph::from_sorted_lines (pass it as Option<&mut PaletteMap>?), and then call PaletteMap::write_into_stream to serialize it back somewhere.

Yeah, you're right, hence my updated suggestion of having PaletteMap being a trait instead. How does that sound?

koute commented

Yeah, you're right, hence my updated suggestion of having PaletteMap being a trait instead. How does that sound?

There are a few drawback that I can see:

  1. Instead of being an opaque blob it exposes the details of the palette map to the user (e.g. from the definition of the trait we can see that this is a key-value store with strings as keys and triplets of u8 as values); some might argue this is unnecessary.
  2. You can't actually reasonably implement it for PathBuf. (: Otherwise you'd have to read the file and parse it every time PaletteMap::lookup(&self: &PathBuf, func: &str) -> (u8, u8, u8) is called. (As Self is PathBuf.)
  3. Having the save() method feels like a slight code smell to me; I think it would be better for the caller to decide when and how to save it, instead of the flamegraph generator deciding (and triggering!) it.

I think the details have to be exposed to the user regardless if we want them to be able to choose how to represent the map? Specifically, the trait allows the caller to choose the storage format, and doesn't tie them to using flamegraph.pl's specific line-oriented plain-text syntax. With an opaque blob, I think we require that the user have a byte string of exactly the expected form, which seems more limiting to me?

I think it'd be implemented for essentially struct PaletteMapFile(PathBuf, HashMap). That implementation would mostly be for the CLI, though I suppose users could also use it. I imagine something like PaletteMapFile::from(PathBuf).

True, save() is probably a bad name. Maybe something like finished()? We want to have some way to signify that we're "done" with an iteration so that the implementor doesn't have to save on every call to remember.

What about:

  • providing the current PaletteMap struct as public
  • remove the associated fields from Options
  • making from_sorted_lines take an Option<&mut PaletteMap> as argument (we will need to parse&save the palette.map file in the main for the CLI)
  • implementing the lookup and remember functions proposed by @jonhoo for PaletteMap. This let inferno, and the user, update and search a PaletteMap programmatically.
  • providing some Serializer/Deserializer trait on PaletteMap, and implementing it for the flamegraph.pl format
  • implementing Iterator<Item=(String, (u8, u8, u8))> to provide an access to the user without having to know the implementation details, and help them implement their PaletteMap Serializer/Deserializer (String may be an issue if we want to avoid allocations ? and we may want to provide an RGB struct, or use one from an existing crate ?)

This let the user chooses if and when they want to save the PaletteMap, and from/to which format.

koute commented

I think the details have to be exposed to the user regardless if we want them to be able to choose how to represent the map?

This is a fair point, but I don't think it's really useful to let the user choose how the map should be represented at runtime - virtually everyone's going to use a HashMap, so that will only add unnecessary complexity. What's important is to let the user pick where to serialize it, and that doesn't require exposing any implementation details. We can always add an extra interface for modifying and/or building the palette map later, without conflating the use case of serialization and modification/customization.

In general I'm a fan of APIs which are minimal, hide their implementation details, and are forward compatible so that the details of how they work and/or extra customization points can be gradually exposed without breaking backwards compatibility and without impacting the users which do not need those extra features.

Consider the following:

struct PaletteMap { /* ... */ }

// Minimal interface:
impl PaletteMap {
    pub fn new() -> Self;
    pub fn from_stream(fp: &dyn mut io::Read) -> io::Result<Self>;
    pub fn to_stream(&self, fp: &dyn mut io::Write) -> io::Result<()>;
}

// We could add some extra convenance methods if needed:
impl PaletteMap {
    pub fn load_from_file(path: &dyn AsRef<Path>) -> io::Result<Self>;
    pub fn save_to_file(&self, path: &dyn AsRef<Path>) -> io::Result<()>;
}

// We could also add APIs for inspecting and modifying the palette:
type Color = (u8, u8, u8);
impl PaletteMap {
    fn iter(&self) -> impl Iterator<Item=(&str, Color)>
    fn get(&self, func: &str) -> Color;
    fn set(&mut self, func: &str, color: Color);
}

// And it's simple to use:
let mut palette = PaletteMap::new();

// Generate multiple flamegraphs:
let mut options = Options::default();
options.palette = Some(&mut palette);
flamegraph::from_sorted_lines(options, lines_1, writer)?;
let mut options = Options::default();
options.palette = Some(&mut palette);
flamegraph::from_sorted_lines(options, lines_2, writer)?;

// Now the user can save it:
palette.to_stream(&mut fp)?;

This has also the nice property that it's extensible, e.g. along with the color we could add the ability to specify the font family in the future. With this kind of API you can simply add it in a fully backwards compatible manner, since you didn't directly expose the internals.

With an opaque blob, I think we require that the user have a byte string of exactly the expected form, which seems more limiting to me?

The user is not supposed to peek at what exactly that string is! (:

True, save() is probably a bad name. Maybe something like finished()? We want to have some way to signify that we're "done" with an iteration so that the implementor doesn't have to save on every call to remember.

But don't we already have that implicitly once flamegraph::from_sorted_lines returns the control to the user? (:

I agree wholeheartedly with that entire post @koute :) Let's go for that approach! @JordiChauzi do you want to give this one a stab? Or maybe @koute would? I'm a bit strapped for time atm, but would be happy to review.

Unless, @koute want's to do it, i'll work on it in the following days.