performancecopilot/pcp

CSV output when using derived metrics sets does not actually include the critical commas

Closed this issue · 10 comments

When using a derived metric set such as :vmstat or :sar and setting the output to CSV format with pmrep, the commas are not inserted between the individual metrics, making the output unusable as a CSV.

# pmrep :vmstat -p -a *.0 -o csv
Time "kernel.all.running" "kernel.all.blocked" "swap.used" "mem.util.free" "mem.util.bufmem" "mem.util.allcache" "swap.pagesin" "swap.pagesout" "mem.vmstat.pgpgin" "mem.vmstat.pgpgout" "kernel.all.intr" "kernel.all.pswitch" "kernel.all.cpu.alluserp" "kernel.all.cpu.sysp" "kernel.all.cpu.idlep" "kernel.all.cpu.wait.totalp" "kernel.all.cpu.stealp"

versus

# pmrep kernel.all.running kernel.all.blocked swap.used -p -a *.0 -o csv
Time,"kernel.all.running","kernel.all.blocked","swap.used"

and

# pmrep :sar -p -a *.0 -o csv
Time "kernel.cpu.util.user" "kernel.cpu.util.nice" "kernel.cpu.util.sys" "kernel.cpu.util.wait" "kernel.cpu.util.steal" "kernel.cpu.util.idle"

versus

# pmrep kernel.cpu.util.user kernel.cpu.util.nice kernel.cpu.util.sys -p -a *.0 -o csv
Time,"kernel.cpu.util.user","kernel.cpu.util.nice","kernel.cpu.util.sys"

This also affects the pcp2csv and importantly for me the pcp2json tools, where the respective outputs are unusable due to the lack of data separation.

# pmrep --version
pmrep version 6.1.1

Correction -- This does not affect the normal operation of the pcp2json tool, but we have a use case where we first use pcp2csv to flatten the data structure and then back-convert that into JSON format in order to make the data compatible with Elasticsearch.

This is caused by the metricset configuration, there

delimiter = " "

is needed to mimic the typical vmstat output. By default this applies to CSV output as well.

You can override the config file parameter on the command line with -l ,.

@myllynen so is the required "fix" in pmrep to force -l , , when -o csv is in play?

The default delimiter with CSV is comma if no other delimiter has been specified. But here the metricset defines a delimiter.

I think this is somewhat atypical case so I wouldn't be eager to change code for this. I guess one option for a user would be to create a new vmstat-csv metricset by copy-pasting the vmstat metricset and changing the delimiter in there.

@myllynen Understood about reluctance to change code.
Perhaps the man page should mention this issue and how to workaround it?
I think it fails the "law of least surprises" if I mention -o csv on the command line, and some other (non-obvious?) config option means I don't get comma-separated-values by default.

Fair enough, I added a hint to the man page, please let me know if that's clear enough. Thanks.

@myllynen I've wordsmithed it a little, is this OK with you?

            ... Note that many  default  metricsets
            specify  a delimiter (that may not be a comma) so it might be nec‐
            essary to use this option with metricsets to set the delimiter  as
            comma for CSV output, i.e.  --delimiter=,

Fair enough. We can fix this easily for our use case it seems.

However, I would argue that the expected result of either -o csv or pcp2csv should be, well, a comma-separated value format, so maybe these tools should default to overriding the delimiter with, in fact, a comma.

@kmcdonell Yes, looks good, thanks.

@dustinblack Yeah I agree this is perhaps not ideal, I guess the problem is pmrep trying to be a bit too fancy here and allowing e.g. semicolon to be the CSV separator.

IMO pmrep should by default override the delimiter with a comma when -o csv or pcp2csv are used, and the -l option could still be available for a further user override. To me, this is still a bug because it breaks well-known semantics, but it isn't worth a lot of trouble to debate.