/ascii_plots

Convenience function for quick and dirty data analysis

Primary LanguagePython

ascii_plots

These are just some silly scripts which I like to have on my command line when I'm doing quick and dirty data analysis and can't be bothered to start R. They all receive the data by piping, typically downstream of awk, cut...

They all handle non-numeric data as NA.

author

Daniel Zerbino

Use

For a quick demo, run:

> sh demo.sh

cor - correlation:

Takes in stdin a file with two columns, print out Pearson correlation.

> cut -f1,2 test.tsv | ./cor
0.987425

summary:

Takes in stdin a tab delimited data file with or without headers (anything numeric is assumed to be data, anything else NA) and prints out basic stats on each column (position, header (or first value), min, mean, max, sum)

> cat test.tsv | ./summary
COL |           1 |          2 |          3 |          4 |
NAM |           A |          B |          C |          D |
MIN |           7 |          5 |          9 |          0 |
AVG |           3 |       1.75 |       3.75 |          0 |
MAX |           7 |          5 |          9 |          0 |
SUM |          12 |          7 |         15 |          0 |

hist - histogram:

Either:

  • Takes in a single column of numbers, displays histogram
  • Takes in a double column of numbers, and displays a weighted histogram of the data, assuming the first column are values and the second column weights.

The size of the bins is 1 by default, but can be specified as an option

    > awk 'func r(){return sqrt(-2*log(rand()))*cos(6.2831853*rand())}BEGIN{for(i=0;i<10000;i++)s=s"\n"0.5*r();print s}' | ./hist 0.1
      -1.8 |     0.0001 |          1 | 
      -1.6 |     0.0004 |          4 | 
      -1.5 |     0.0005 |          5 | 
      -1.4 |     0.0007 |          7 | 
      -1.3 |     0.0018 |         18 | 
      -1.2 |      0.004 |         40 | **
      -1.1 |     0.0058 |         58 | **
        -1 |     0.0085 |         85 | ****
      -0.9 |     0.0126 |        126 | ******
      -0.8 |     0.0197 |        197 | **********
      -0.7 |     0.0285 |        285 | **************
      -0.6 |     0.0349 |        349 | *****************
      -0.5 |     0.0422 |        422 | *********************
      -0.4 |     0.0532 |        532 | ***************************
      -0.3 |     0.0634 |        634 | ********************************
      -0.2 |     0.0681 |        681 | **********************************
      -0.1 |     0.0756 |        756 | **************************************
         0 |     0.1557 |       1557 | ********************************************************************************
       0.1 |     0.0743 |        743 | **************************************
       0.2 |     0.0698 |        698 | ***********************************
       0.3 |     0.0628 |        628 | ********************************
       0.4 |     0.0546 |        546 | ****************************
       0.5 |      0.042 |        420 | *********************
       0.6 |     0.0351 |        351 | ******************
       0.7 |     0.0252 |        252 | ************
       0.8 |     0.0208 |        208 | **********
       0.9 |      0.014 |        140 | *******
         1 |     0.0104 |        104 | *****
       1.1 |     0.0065 |         65 | ***
       1.2 |     0.0035 |         35 | *
       1.3 |      0.002 |         20 | *
       1.4 |     0.0014 |         14 | 
       1.5 |     0.0009 |          9 | 
       1.6 |     0.0005 |          5 | 
       1.7 |     0.0001 |          1 | 
       1.8 |     0.0001 |          1 | 
       1.9 |     0.0002 |          2 | 
         2 |     0.0001 |          1 | 
TOTAL      |          1 |      10000 |

bars:

Like histogram, but for categorical data:

> cut -f1 test.tsv | ./bars

	 1.0 |       0.25 |          1 | ********************************************************************************
	 4.0 |       0.25 |          1 | ********************************************************************************
	 7.0 |       0.25 |          1 | ********************************************************************************
	   A |       0.25 |          1 | ********************************************************************************
TOTAL            |          1 |          4 |

scatter:

Takes in a double column of numbers, and displays a sketchy ascii density plot.

> awk 'func r(){return sqrt(-2*log(rand()))*cos(6.2831853*rand())}BEGIN{for(i=0;i<10000;i++)s=s"\n"0.5*r()"\t"0.5*r();print s}' | ./scatter
---------------------------------------------------------------------------------------------------------------------- 2.00418
|                                       '              '                                                             |
|                                                    '                                                               |
|                                               '                     '                                              |
|                              '       '         '                                                  '                |
|                                          '  '     '   `  ''        `'   '    '                                     |
|                                ' '  '  '   ' '    ' `    '     '      '     '                                      |
|                          '     '     '''  '' '   '    ' ''          '  ' '             '                           |
|                              '     '    '' '   '   ,`' ,'`' ` '''   ` `' '  '            ''   '                    |
|                     '      '' ` `' ' '`'' '' '  `'   ''``''`''`'';' `,  ''  ''    `' '                             |
|                         ,     ' '''''  ````'' '`,`!' ,,;` `;`'> ``'```'``,'`  '`' `'   ' `                         |
|                    '   '' '''`,'`; ' '``',`````;!!,,;'`;,;''! ,;,!,;'';'`, '` `''  `        '    '                 |
|                   '    '    ,'`'`,,`,;',``,;;,`!~!;{,,!'!!!,!>!;`!~,,;'`,';`''`'' '''` ' `   '              '      |
|                 ' `'   ` `' ',;, `' ',`!~`!; !']~{-!{~~]>;!!-!{!;';!!~;;;' !'`;`','''``'     `  '                  |
|                  `''    '''';'',;,;'>>)>,)~;-{]|~-j~]t~)]{t~)~!->]-!>!;,!`>`,`, ,;;,'`',''                         |
|          '   ''' ''',' `, `!``;;;,~]];!!>]!)-{]vt|vj]n-~-{|j,)>-~n!]~{~!!>'>!;`!`,`` , ` '`   `        '           |
|               '' ' `' '`'',`;;;~!-{`~~|>{~{]v>{|)XX|v-~{otnjCtC)v;{tX)]->;)>>,`!],`;;  `;` ; '`      '      '      |
|          '  '  '`''  , '`',`!!,!)>!-~{{j|,j|-0njC||0U0CoXn|]o-tUjC{|U)|>-]->{~{!!,;' !`  ,`'  ''  ' ' ' ' '        |
|             '  `  `  ` !;- ;>,-;|>{~>!{]|-U]j]XUU00kjXkj00)|jtXjjttnv]n-`]{;{!~!~!-!';,''`'  '' ' ` '              |
|   '       '   '   ' ;,``'`'`,!>!]])-|t{t|{-U{)|ntCtCnkvkvqCXZUC&Z~0||)-!|{{|;>),]!~; ,`,`';'  ''' ''      '        |
|      ` '   '   ,`'`''``;'`', ,;{)]{{tn)]]{UvdjCZv#-jtC0tZtC|UZ0jUUUC{{0]|!>]{>~~~~{!;`,; ';;;' ' '  '             '|
|            '  '  ' ,''``;`~,!];;>!;!|-)]CZvt{UXZqCC$ZnZ|$nokXCUUkjXt]--X|t])>`]!->``;;`>;`'' ''                '   |
|      ''   '`''''' '', `,'>!)!!->)]|))|UtX)|tnvt{|nZvqU0nqdC0#{v)Uqnt|{--t{)])!;,>`-,,;,';`''``   ' `               |
|         '      ,' ,'``,;'';!,>>]~~~>~]>]q)|0vnvjjCZvqvnqtX0n)qttvv{X)]t~|]j!),]' !;`,``'``'' ''''   '              |
|           '  ' '` ````!,``'>>'{;>~;!;~{|j!)]nZtXnj|U0Udtd0njXvjj){nn>]]){{`~;>>- ,;!, `,'`,'     '       '         |
|   '  '             '''`'` ;`>,>;,;-,~~`-)>]t~t|{-n)t{{tnU]jXUv]n-~],;-t;~;!{!~>,`;,`,`;`'` '` `''      ''          |
|'        '    `'     '' '',,`,`;~;,,,;!-|~-tj-])v!>|]t--j)>Uv]>-~]~!;;!,~-,>'!',,'``''' ',`' ` ' '     '  '         |
|              '  `   `'>   ',;```!;~;|!~;~->>-,]]~>-;~]))]!>!-`-)-,]-{~,;,`;`',',`'; ,`,'`''  '   '                 |
|                     ' ' `''' ''`'',,,;;!!-,{`-];>~-,>-~~>;{!)];`;--,>`;!,;`;;`'`; ` ''''   '      `                |
|                  '        ' ' '''` `` `';`';`,;>!,!~!~,~-;>~;!!``!,>!`',!`,`'`,, '' ,', ''''                       |
|                '          '''  ' `'`; ''`;`;``> ';>;,!>'''>!>'`;;;;` `'' `' '`''     '           '                 |
|         '                `  ``'  `'`  ' '''`!`;`!'`,`'` ``;;'!` `! ,'`;',` '' ' '                                  |
|           '             ' ' ''   ' `  ,' ` `', ,'`''';'`'``''' ''''```'     `, '' `''   '                          |
|                            '     ''   ' ''   ' `' ''' ` ' `', '''' ' ` '   '''                '                    |
|                       ''               ' `   ' `   `'   ' '  `'      '         ' '                                 |
|                        '        '   '      , '   ' '     '             '           '                               |
|                                    '              ''       ' '      `                 '                            |
|                                                         '  '              '                                        |
|                                                                 '                                                  |
---------------------------------------------------------------------------------------------------------------------- -1.7106
-1.826500                                                                                                     1.910550

curve:

Draws a curve from a single column of numbers [NOTE: requires scatter to be in the same directory]

> awk 'BEGIN{for(i=0;i<100;i++)s=s"\n"sin(i/10);print s}' | ./curve 
---------------------------------------------------------------------------------------------------------------------- 0.999574
|               $$$$$$ $                                                                 $$$$$ $                     |
|             $         $                                                             $ $       $$                   |
|           $$           $                                                           $            $                  |
|                         $                                                         $              $                 |
|          $               $                                                                        $                |
|         $                 $                                                      $                                 |
|        $                                                                        $                   $              |
|                             $                                                  $                     $             |
|      $                       $                                                                                     |
|     $                                                                        $                        $            |
|                               $                                                                        $           |
|    $                                                                        $                                      |
|                                $                                           $                            $          |
|   $                                                                                                                |
|  $                              $                                         $                              $         |
|                                  $                                                                                 |
| $                                                                        $                                 $       |
|                                    $                                                                               |
|$                                                                        $                                   $      |
|                                     $                                                                        $     |
|                                                                        $                                           |
|                                      $                               $                                        $    |
|                                                                                                                    |
|                                       $                             $                                          $   |
|                                        $                                                                           |
|                                                                    $                                            $  |
|                                         $                                                                         $|
|                                                                   $                                                |
|                                          $                       $                                                 |
|                                            $                                                                       |
|                                                                 $                                                  |
|                                             $                 $                                                    |
|                                              $               $                                                     |
|                                               $             $                                                      |
|                                                $           $                                                       |
|                                                 $         $                                                        |
|                                                   $$$ $$ $                                                         |
|                                                      $                                                             |
---------------------------------------------------------------------------------------------------------------------- -0.999923
2.000000                                                                                                    101.000000

column_descriptions:

Extracts the header and a number of sample values from each column:

> cat test.tsv | ./column_descriptions
1	A	3 sampled numerical values: 4.000000 ± 2.449490 (total = 12.000000)
2	B	5, NA, 2
3	C	NA, 6, 9
4	D	NA