Bogumił Kamiński, May 23, 2018
A brief introduction to basic usage of DataFrames. Tested under Julia 0.6.2, DataFrames 0.11.6, CSV 0.2.4, JLD 0.8.3, Missings 0.2.9, CategoricalArrays 0.3.9, FreqTables 0.2.2, DataFramesMeta 0.3.0, StatPlots 0.7.2.
I will try to keep it up to date as the package evolves. This tutorial covers DataFrames, CSV, JLD, Missings, and CategoricalArrays, as they constitute the core of DataFrames.
In the last extras part mentions selected functionalities of selected useful packages that I find useful for data manipulation, currently those are: FreqTables, DataFramesMeta, StatPlots.
File | Topic |
---|---|
01_constructors.ipynb | Creating DataFrame and conversion |
02_basicinfo.ipynb | Getting summary information |
03_missingvalues.ipynb | Handling missing values |
04_loadsave.ipynb | Loading and saving DataFrames |
05_columns.ipynb | Working with columns of DataFrame |
06_rows.ipynb | Working with row of DataFrame |
07_factors.ipynb | Working with categorical data |
08_joins.ipynb | Joining DataFrames |
09_reshaping.ipynb | Reshaping DataFrames |
10_transforms.ipynb | Transforming DataFrames |
11_performance.ipynb | Performance tips |
12_pitfalls.ipynb | Possible pitfalls |
13_extras.ipynb | Additional interesting packages |
Changelog:
Date | Changes |
---|---|
2017-12-05 | Initial release |
2017-12-06 | Added description of insert! , merge! , empty! , categorical! , delete! , DataFrames.index |
2017-12-09 | Added performance tips |
2017-12-10 | Added pitfalls |
2017-12-18 | Added additional worthwhile packages: FreqTables and DataFramesMeta |
2017-12-29 | Added description of filter and filter! |
2017-12-31 | Added description of conversion to Matrix |
2018-04-06 | Added example of extracting a row from a DataFrame |
2018-04-21 | Major update of whole tutorial |
2018-05-01 | Added byrow! example |
2018-05-13 | Added StatPlots package to extras |
2018-05-23 | Improved comments in sections 1 do 5 by Jane Herriman |
- Constructors:
DataFrame
- Getting summary:
size
,nrow
,ncol
,length
,describe
,showcols
,names
,eltypes
,head
,tail
- Handling missing:
missing
(singleton instance ofMissing
),ismissing
,Missings.T
,skipmissing
,coalesce
,allowmissing
,disallowmissing
,allowmissing!
,completecases
,dropmissing
,dropmissing!
, disallowmissing, disallowmissing! - Loading and saving:
CSV
(package),JLD
(package),CSV.read
,CSV.write
,save
(fromJLD
),load
(fromJLD
) - Working with columns:
rename
,rename!
,names!
,hcat
,insert!
,DataFrames.hcat!
,merge!
,delete!
,empty!
,categorical!
,DataFrames.index
- Working with rows:
sort!
,sort
,issorted
,append!
,vcat
,push!
,view
,filter
,filter!
,deleterows!
,unique
,nonunique
,unique!
- Working with categorical:
categorical
,cut
,isordered
,ordered!
,levels
,unique
,levels!
,droplevels!
,get
,recode
,recode!
- Joining:
join
- Reshaping:
stack
,melt
,stackdf
,meltdf
,unstack
- Transforming:
groupby
,vcat
,by
,aggregate
,eachcol
,eachrow
,colwise
- Extras:
- FreqTables:
freqtable
,prop
- DataFramesMeta:
@with
,@where
,@select
,@transform
,@orderby
,@linq
,by
,based_on
,byrow!
- StatPlots:
@df
,plot
,density
,histogram
,boxplot
,violin
- FreqTables:
- Improved rendering of
#undef
in HTML/LaTeX. - Added
permutecols!
function. describe
returns aDataFrame
- On Julia 0.7 you can access columns of
DataFrame
using.
notation