# Collapsing data using an index ## collapseData() is a general function to summarize data based on an index The function collapseData() can collapse a vector based on a user-define function and an index/factor. ``` collapseData <- function(x, ind, func=mean, ...) { tapply(X=x, INDEX=ind, FUN=func, ...) } ``` An example is provided below. First we create a data.frame with repeated values in column 1 ``` dat <- data.frame(IDS=rep(letters[1:3], each=3), Values=1:9) ``` We can then collapse the information based on the index. By default the summarization is obtained by using the mean() function. ``` collapsedValues <- collapseData(dat$Values, dat$IDS) ``` An alternative function can be passed too, here I just return the first value ``` collapsedValues <- collapseData(dat$Values, dat$IDS, func=sum) ``` This function can be also used over the columns of a data.frame by coupling it with apply() ``` collapsedValues <- apply(dat, 2, collapseData, ind=dat$IDS, func=function(x) x[1] ) ``` The collapseSelectOutput() function can be used to collapse annotation data.frames generated by select() from [AnnotationDbi](http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html) ``` collapseSelectOutput <- function(dat, keyCol=1, glue="; ", ...) { apply(dat, 2, collapseData, ind=dat[,keyCol], func=function(x) paste(unique(x), collapse=glue)) } ``` An example ``` require(TxDb.Hsapiens.UCSC.hg19.knownGene) keytypes(TxDb.Hsapiens.UCSC.hg19.knownGene) ann <- select(TxDb.Hsapiens.UCSC.hg19.knownGene, keys=c("1", "2", "3"), keytype="GENEID", columns=c("TXID", "TXNAME")) annCollapsed <- collapseSelectOutput(ann) ``` ## ENJOY!