Package to read and write all Stata file formats (version 14 and older) into a R data.frame. The dta file format versions 102 to 118 are supported.
The function read.dta
from the foreign package imports only dta files from
Stata versions <= 12. Due to the different structure and features of dta 117
files, we wrote a new file reader in Rcpp.
Additionally the package supports many features of the Stata dta format like
label sets in different languages (?set.lang
) or business calendars
(?as.caldays
).
The package is now hosted on CRAN.
install.packages("readstata13")
library(readstata13)
dat <- read.dta13("path to file.dta")
save.dta13(dat, file="newfile.dta")
To install the current release from github you need the plattform specific build tools. On Windows a current installation of Rtools is necessary, while OS X users need to install Xcode.
# install.packages("devtools")
devtools::install_github("sjewo/readstata13", ref="0.8.5")
Older Versions of devtools require a username option:
install_github("readstata13", username="sjewo", ref="0.8.5")
To install the current development version from github:
devtools::install_github("sjewo/readstata13", ref="testing")
- [0.8.5] fix errors on big-endians systems
- [0.8.4] fix valgrind errors. converting from dta.write to writestr
- [0.8.4] fix for empty data label
- [0.8.4] make replace.strl default
- [0.8.3] restrict length of varnames to 32 chars for compatibility with Stata 14
- [0.8.3] add many function tests
- [0.8.3] avoid converting of double to floats while writing compressed files
- [0.8.2] save NA values in character vector as empty string
- [0.8.2] convert.underscore=T will convert all non-literal characters to underscores
- [0.8.2] fix saving of Dates
- [0.8.2] save with convert.factors by default
- [0.8.2] test for NaN and inf values while writing missing values and replace with NA
- [0.8.2] remove message about saving factors
- [0.8.1] convert non-integer variables to factors (
nonint.factors=T
) - [0.8.1] handle large datasets
- [0.8.1] working with strL variables is now a lot faster
- reading data files from disk or url and create a data.frame
- saving dta files to disk - most features of the dta file format are supported
- assign variable names
- read the new strL strings and save them as attribute
- convert stata label to factors and save them as attribute
- read some meta data (timestamp, dataset label, formats,...)
- convert strings to system encoding
- handle different NA values
- handle multiple label languages
- convert dates
- reading business calendar files
- cleanup of Rcpp code
Since our attributes differ from foreign::read.dta all.equal and identical report false. If you check the values, everything is identical.
library("foreign")
r12 <- read.dta("http://www.stata-press.com/data/r12/auto.dta")
r13 <- read.dta13("http://www.stata-press.com/data/r13/auto.dta")
Map(identical,r12,r13)
att <- names(attributes(r12))
for (i in seq(att))
cat(att[i],":", all.equal(attr(r12,att[i]),attr(r13,att[i])),"\n")
r12 <- read.dta("http://www.stata-press.com/data/r12/auto.dta",convert.factors=F)
r13 <- read.dta13("http://www.stata-press.com/data/r13/auto.dta",convert.factors=F)
Map(identical,r12,r13)
Marvin Garbuszus (JanMarvin) and Sebastian Jeworutzki (both Ruhr-Universität Bochum)
GPL2