aggr: A Python repository from stefanschramm

aggr(1)                                                            aggr(1)



NAME
       aggr - aggregate CSV-data and print on the standard output


SYNOPSIS
       aggr [OPTIONS] COLUMNS...


DESCRIPTION
       aggr  aggregates  CSV-data  (comma seperated values) from a file or
       standard input thru well-known aggregate functions and outputs  the
       result  to  standard output. It tries to simulate some of the func‐
       tionality that is provided by SQL's "GROUP BY"-statement.


OPTIONS
       -f INPUTFILE
              read data from INPUTFILE instead of standard input


       -d DELIMITER
              use DELIMITER instead of Tab as field delimiter  (for  input
              and output)


COLUMNS
       The  columns that will be printed in output are defined by the COL‐
       UMNS arguments. There are two kinds of columns:


       Grouping columns
              define which columns are used to group the input data.  They
              are  simply  specified by their column number. These columns
              are the attributes that one would mention in  the  GROUP  BY
              statement in SQL.


       Aggregated columns
              are  columns  whose  values  are  calculated by an aggregate
              function. The syntax is AGGREGATEFUNCTION:COLUMNNUMBER.  See
              below for supported aggregate functions.


       The columns will appear in the output in the same order as in which
       they were specified in the arguments. It is possible to output  the
       same column several times (with different aggregate functions). The
       column numbering scheme is 0-based: the  first  column  has  column
       number 0, the second 1 and so on.



AGGREGATE FUNCTIONS
       These aggregate functions are supported: count, sum, avg, min, max,
       first, last, median.



EXAMPLES
       Let's say there is a file name_city_age.csv with the following con‐
       tent (list of persons with name, city and age):

       Hans;Regensburg;42
       Heidi;Straubing;40
       Peter;Straubing;20
       Else;Regensburg;10
       Hans;Augsburg;63
       Hans;Ingolstadt;71

       Then

       aggr -f name_city_age.csv -d ";" 1 avg:2

       will return:

       Regensburg;26.0
       Straubing;30.0
       Augsburg;63.0
       Ingolstadt;71.0

       The  grouping-criteria is the second column (column number 1) which
       is the column with the city names. The  aggregated  column  is  the
       average  age  of  the  listed  inhabitants of this city. In SQL the
       query would be something like  SELECT  city,AVG(age)  FROM  persons
       GROUP BY city;.

       aggr -f name_city_age.csv -d ";" avg:2 1 | sort -rn

       Will  basically do the same as in the previous example, but outputs
       the average-age column first and uses sort -rn to output  the  rows
       in descending order.

       aggr -f name_city_age.csv -d ";" 0 count:*

       This  time  we're  using  the name column for grouping and count as
       aggregate function. This gives us the number how often  which  name
       is in the list.

       Find   more   examples   in  the  wiki:  https://github.com/stefan‐
       schramm/aggr/wiki



MAJOR DIFFERENCES TO SQL'S GROUP BY:
       In aggr an aggregate function can only be applied to a single  col‐
       umn. That means expressions like SUM(price * amount) in SQL are not
       possible here. To achieve this you should first calculate the prod‐
       uct  with  some  other commandline utility into a single column and
       then pipe it's output to aggr.


AUTHOR
       Stefan Schramm <mail@stefanschramm.net>



                               NOVEMBER 2010                       aggr(1)
stefanschramm/aggr