This is a sample implementation of ddply and some other R type stuff that I did for my June 2013 presentation to the Bay Area Clojure Users Group.
This is not meant to be a complete implementation, but rather as a demonstration of the Split-Apply-Combine concepts proposed by Hadley Wickham in his paper The Split-Apply-Combine Strategy for Data Analysis (Journal of Statistical Software, April 2011, Volume 40, Issue 1) and implemented in the plyr library. See http://plyr.had.co.nz for more information on the plyr project.
There is API documentation for these functions at http://tomfaulhaber.github.io/split-apply-combine.
These are the files and what's in each of them:
In the root directory:
- script.clj has the set of commands that I ran interactively to construct the results we saw during the live demo. I also used
nrow
,head
, andfrequencies
to do some ad hoc exploration of the data along the way.
In src/split-apply-combine
:
- core.clj has
transform
,transform*
, andcolwise
with their supporting functions. There are also a few other generally useful functions. - ply.clj has the implementation of
ddply
,ddply*
,d_ply
, andd_ply*
and their various supporting functions. - cpu.clj has routines for defining, loading an manipulating the CPU load data I showed during the demo. (Note the only part of this we used during the demo was the
cpu-files
data which had the list of data files.) - stock.clj has routines for loading and manipulating the stock data I spoke about. This data was all pulled from yahoo finance.
Data files (in data/
):
- tech-stocks.csv is some simple stock data that include the data on the slides (Amazon, IBM, and Microsoft from the first four months of 2013). Read it with the
read-saved-data
function. It's easy to pull your own data from yahoo finance, just use theload-yahoo-data
function. - scdb_agent_*.cpu is the set of files with CPU load data that we used during the demo portion of the presentation. Thanks to SpaceCurve, Inc. (www.spacecurve.com) for permission to use this data.
Copyright © 2013 Tom Faulhaber
Distributed under the Eclipse Public License, the same as Clojure.