Write tfrecords from R
tfrecords
writes .tfrecords efficiently directly from R. It currently supports writing
lists of matrices (matrix
class in R), arrays (array
class in R) and sparse matrices (dgCMatrix
in R).
data <- list(
x = matrix(1:1000, nrow = 100),
y = Matrix::rsparsematrix(nrow = 100, ncol = 10, density = 0.3),
z = array(1:10000, dim = c(100, 10, 10)),
)
write_tfrecords(data, "example.tfrecords")
All elements of the data
list must have the same number of observations - number of rows for matrices and sparse matrices and size of first dimension for arrays. This a requeriment of the .tfrecords format since we need to write each observation at a time.
Before installing the R package you need to install Protobuf compiler and runtime libraries (version > 3) - See installation instructions below.
You can then install tfrecords
from Github with:
# install.packages("devtools")
devtools::install_github("dfalbel/tfrecords")
There's not an apt
package to install protobuf > 3.0 directly on Ubuntu so you will
need to compile it from source.
The following should just work:
curl -OL https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-cpp-3.5.1.zip
unzip protobuf-cpp-3.5.1.zip -d proto
cd proto/protobuf-3.5.1/
./configure
make
sudo make install
sudo ldconfig
It downloads the protobuf source code from the releases page and compile both the runtime library and compiler from source. Note that the compilation takes ~15 minutes.
MacOS installation can be done via brew
. Make sure to run:
brew install protobuf
before installing the tfrecords
package.
Pre built binaries are downloaded via rwinlib
in the configurations steps, so you shouldn't need to install protobuf before installing
tfrecords
package.