aggr(1) aggr(1) NAME aggr - aggregate CSV-data and print on the standard output SYNOPSIS aggr [OPTIONS] COLUMNS... DESCRIPTION aggr aggregates CSV-data (comma seperated values) from a file or standard input thru well-known aggregate functions and outputs the result to standard output. It tries to simulate some of the func‐ tionality that is provided by SQL's "GROUP BY"-statement. OPTIONS -f INPUTFILE read data from INPUTFILE instead of standard input -d DELIMITER use DELIMITER instead of Tab as field delimiter (for input and output) COLUMNS The columns that will be printed in output are defined by the COL‐ UMNS arguments. There are two kinds of columns: Grouping columns define which columns are used to group the input data. They are simply specified by their column number. These columns are the attributes that one would mention in the GROUP BY statement in SQL. Aggregated columns are columns whose values are calculated by an aggregate function. The syntax is AGGREGATEFUNCTION:COLUMNNUMBER. See below for supported aggregate functions. The columns will appear in the output in the same order as in which they were specified in the arguments. It is possible to output the same column several times (with different aggregate functions). The column numbering scheme is 0-based: the first column has column number 0, the second 1 and so on. AGGREGATE FUNCTIONS These aggregate functions are supported: count, sum, avg, min, max, first, last, median. EXAMPLES Let's say there is a file name_city_age.csv with the following con‐ tent (list of persons with name, city and age): Hans;Regensburg;42 Heidi;Straubing;40 Peter;Straubing;20 Else;Regensburg;10 Hans;Augsburg;63 Hans;Ingolstadt;71 Then aggr -f name_city_age.csv -d ";" 1 avg:2 will return: Regensburg;26.0 Straubing;30.0 Augsburg;63.0 Ingolstadt;71.0 The grouping-criteria is the second column (column number 1) which is the column with the city names. The aggregated column is the average age of the listed inhabitants of this city. In SQL the query would be something like SELECT city,AVG(age) FROM persons GROUP BY city;. aggr -f name_city_age.csv -d ";" avg:2 1 | sort -rn Will basically do the same as in the previous example, but outputs the average-age column first and uses sort -rn to output the rows in descending order. aggr -f name_city_age.csv -d ";" 0 count:* This time we're using the name column for grouping and count as aggregate function. This gives us the number how often which name is in the list. Find more examples in the wiki: https://github.com/stefan‐ schramm/aggr/wiki MAJOR DIFFERENCES TO SQL'S GROUP BY: In aggr an aggregate function can only be applied to a single col‐ umn. That means expressions like SUM(price * amount) in SQL are not possible here. To achieve this you should first calculate the prod‐ uct with some other commandline utility into a single column and then pipe it's output to aggr. AUTHOR Stefan Schramm <mail@stefanschramm.net> NOVEMBER 2010 aggr(1)