This tool will accept a list of newline separated numbers via STDIN and output one or more statistical measures to STDOUT. For example, if you have a log for a web service and you want to know the average and maximum request times, you can do something like this:
grep REQ_TIME /var/log/myservice.log | awk '{print $NF}' | pst --stats=mean,max
For larger workloads this will support using multiple threads to allow multiple measures to be calculated concurrently. This will be explicitly specified to ensure that the user has control over how many system resources are used. By default, the tool will run in a single thread.
-h = Help
-p, --percentiles = A comma separated list of percentiles. Only
applicable if percentiles is specified for --stats.
-s, --stats = A comma separated list of statistical measures
-t, --threads = The number of worker threads to use.
- count
- sum
- mean
- median
- mode
- stddev (Standard Deviation)
- range
- percentiles
Normally, the median and the 50th percentile are the same value. However, in this case, to keep things simple I chose them to be slightly different. Median is a true median. E.g. it is the value that exactly splits the values in half. So, if you have the values 1, 2, 3, and 4, the median is 2.5. However, the 50th percentile this will return 3 since 50% of the values will be below 3.
pst -s mean
pst -s mean,median,stddev
pst -s percentiles -p 50,90,99
pst -s mean,median,stddev -t 3
This version will implement all of the above features.
The goal for this version will be to support grouping statistics by a key. For example, we could calculate the mean response time for a service by day or hour.
The goal for this release is to support generating charts and graphs based on the output of the tool. Gnuplot is the most likely tool to be used for this.