CSV output when using derived metrics sets does not actually include the critical commas
Closed this issue · 10 comments
When using a derived metric set such as :vmstat
or :sar
and setting the output to CSV format with pmrep
, the commas are not inserted between the individual metrics, making the output unusable as a CSV.
# pmrep :vmstat -p -a *.0 -o csv
Time "kernel.all.running" "kernel.all.blocked" "swap.used" "mem.util.free" "mem.util.bufmem" "mem.util.allcache" "swap.pagesin" "swap.pagesout" "mem.vmstat.pgpgin" "mem.vmstat.pgpgout" "kernel.all.intr" "kernel.all.pswitch" "kernel.all.cpu.alluserp" "kernel.all.cpu.sysp" "kernel.all.cpu.idlep" "kernel.all.cpu.wait.totalp" "kernel.all.cpu.stealp"
versus
# pmrep kernel.all.running kernel.all.blocked swap.used -p -a *.0 -o csv
Time,"kernel.all.running","kernel.all.blocked","swap.used"
and
# pmrep :sar -p -a *.0 -o csv
Time "kernel.cpu.util.user" "kernel.cpu.util.nice" "kernel.cpu.util.sys" "kernel.cpu.util.wait" "kernel.cpu.util.steal" "kernel.cpu.util.idle"
versus
# pmrep kernel.cpu.util.user kernel.cpu.util.nice kernel.cpu.util.sys -p -a *.0 -o csv
Time,"kernel.cpu.util.user","kernel.cpu.util.nice","kernel.cpu.util.sys"
This also affects the pcp2csv
and importantly for me the pcp2json
tools, where the respective outputs are unusable due to the lack of data separation.
# pmrep --version
pmrep version 6.1.1
Correction -- This does not affect the normal operation of the pcp2json
tool, but we have a use case where we first use pcp2csv
to flatten the data structure and then back-convert that into JSON format in order to make the data compatible with Elasticsearch.
This is caused by the metricset configuration, there
delimiter = " "
is needed to mimic the typical vmstat
output. By default this applies to CSV output as well.
You can override the config file parameter on the command line with -l ,
.
@myllynen so is the required "fix" in pmrep to force -l ,
, when -o csv
is in play?
The default delimiter with CSV is comma if no other delimiter has been specified. But here the metricset defines a delimiter.
I think this is somewhat atypical case so I wouldn't be eager to change code for this. I guess one option for a user would be to create a new vmstat-csv
metricset by copy-pasting the vmstat
metricset and changing the delimiter in there.
@myllynen Understood about reluctance to change code.
Perhaps the man page should mention this issue and how to workaround it?
I think it fails the "law of least surprises" if I mention -o csv on the command line, and some other (non-obvious?) config option means I don't get comma-separated-values by default.
Fair enough, I added a hint to the man page, please let me know if that's clear enough. Thanks.
@myllynen I've wordsmithed it a little, is this OK with you?
... Note that many default metricsets
specify a delimiter (that may not be a comma) so it might be nec‐
essary to use this option with metricsets to set the delimiter as
comma for CSV output, i.e. --delimiter=,
Fair enough. We can fix this easily for our use case it seems.
However, I would argue that the expected result of either -o csv
or pcp2csv
should be, well, a comma-separated value format, so maybe these tools should default to overriding the delimiter with, in fact, a comma.
@kmcdonell Yes, looks good, thanks.
@dustinblack Yeah I agree this is perhaps not ideal, I guess the problem is pmrep trying to be a bit too fancy here and allowing e.g. semicolon to be the CSV separator.
IMO pmrep should by default override the delimiter with a comma when -o csv
or pcp2csv
are used, and the -l
option could still be available for a further user override. To me, this is still a bug because it breaks well-known semantics, but it isn't worth a lot of trouble to debate.