RODPS: ODPS Plugin for R
Features
- Read/write dataframe from/to ODPS.
- Convert some of the R models to SQL command.
- The large data set can be processed by using the distributed algorithm.
- The small data set can be processed directly in R.
Requirements
System dependencies:
- Java 8+
- R 1.8+
R libraries:
Installation
- Install the R dependencies:
install.packages('DBI')
install.packages('rJava')
install.packages('RSQLite')
-
Install RODPS
2.1. Install from release package
Check out the latest version on release page. As for version 2.1.3, for example:
install.packages('https://github.com/aliyun/aliyun-odps-r-plugin/releases/download/v2.1.3/RODPS_2.1.3.tar.gz', type="source", repos=NULL)
2.2. Install with
devtools
packagesThis method requires JDK and Maven executables to build java module.
install_github("aliyun/aliyun-odps-r-plugin")
2.3 Install from CRAN (Under development)
Getting Started
- Please make sure the environment variable
RODPS_CONFIG
is set to/path/to/odps_config.ini
export RODPS_CONFIG=/path/to/odps_config.ini
See the configuration template: odps_config.ini.template
- Basic Usage
Under the Hood
Design Architecture
For the mind map of related concepts, please refer to the MindMapDoc
Type System
All numeric in R have possibility of precision loss.
MaxCompute/ODPS | R | Notes |
---|---|---|
BOOLEAN | logical | |
BIGINT | numeric | [-9223372036854774784, 9223372036854774784] * |
INT | numeric | |
TINYINT | numeric | |
SMALLINT | numeric | |
DOUBLE | numeric | |
FLOAT | numeric | |
DATETIME | numeric | POSIXct POSIXlt, in second |
DATE | numeric | POSIXct POSIXlt, in second |
TIMESTAMP | numeric | POSIXct POSIXlt, in second |
INTERVAL_YEAR_MONTH | numeric | in month |
INTERVAL_DATE_TIME | numeric | in second |
DECIMAL | numeric | |
STRING | character | |
CHAR | character | |
VARCHAR | character | |
BINARY | character | |
MAP | - | unsupport |
ARRAY | - | unsupport |
STRUCT | - | unsupport |
- BIGINT(64bit) from MaxCompute is stored and calculated as double(64bit) in RODPS. Precision loss might happen when casting BIGINT to double, which shrinks the min/max value could be written back to MaxCompute/ODPS.
Trouble shooting
- For Windows users: DO NOT install BOTH 32bit and 64bit R on your system, which will introduce compiling issues in the installation of
rJava
.
License
Licensed under the Apache License 2.0