An R package that provides a SQL Server R Database Interface (DBI), based on the cross-platform jTDS JDBC driver.
You can't install the package from CRAN yet. But you install the development version from GitHub:
# install.packages('devtools')
devtools::install_github('imanuelcostigan/RSQLServer')
We recommend that you store server details and credentials in ~/sql.yaml
. This is partly so that you do not need to specify a username and password in calls to dbConnect()
. But it is also because in testing, we've found that the jTDS single sign-on (SSO) library is a bit flaky. The contents of this file should look something like this:
SQL_PROD:
server: 11.1.111.11
type: &type sqlserver
port: &port 1433
domain: &domain companyname
user: &user winusername
password: &pass winpassword
useNTLMv2: &ntlm true
SQL_DEV:
server: 11.1.111.15
type: *type
port: *port
domain: *domain
user: *user
password: *pass
useNTLMv2: *ntlm
Ensure that your ~/sql.yaml
file contains a valid SQL Server entry named TEST
. In the following, the TEST
server, generously provided by Microsoft for the purposes of this package's development, has a database containing the nycflights13
package data sets.
The following illustrates how you can make use of the DBI interface. Note that we do not attach the RSQLServer
package.
library(DBI)
nycflights <- dbConnect(RSQLServer::SQLServer(), server = "TEST", database = 'DBItest')
# RSQLServer only returns tables with type TABLE and VIEW.
dbListTables(nycflights)
#> [1] "airlines" "airports" "cars" "flights" "planes"
#> [6] "test_table" "weather"
dbListFields(nycflights, 'airlines')
#> [1] "carrier" "name"
dbReadTable(nycflights, 'airlines')
#> carrier name
#> 1 9E Endeavor Air Inc.
#> 2 AA American Airlines Inc.
#> 3 AS Alaska Airlines Inc.
#> 4 B6 JetBlue Airways
#> 5 DL Delta Air Lines Inc.
#> 6 EV ExpressJet Airlines Inc.
#> 7 F9 Frontier Airlines Inc.
#> 8 FL AirTran Airways Corporation
#> 9 HA Hawaiian Airlines Inc.
#> 10 MQ Envoy Air
#> 11 OO SkyWest Airlines Inc.
#> 12 UA United Air Lines Inc.
#> 13 US US Airways Inc.
#> 14 VX Virgin America
#> 15 WN Southwest Airlines Co.
#> 16 YV Mesa Airlines Inc.
# Fetch all results
res <- dbSendQuery(nycflights, 'SELECT TOP 10 * FROM airlines')
dbFetch(res)
#> carrier name
#> 1 9E Endeavor Air Inc.
#> 2 AA American Airlines Inc.
#> 3 AS Alaska Airlines Inc.
#> 4 B6 JetBlue Airways
#> 5 DL Delta Air Lines Inc.
#> 6 EV ExpressJet Airlines Inc.
#> 7 F9 Frontier Airlines Inc.
#> 8 FL AirTran Airways Corporation
#> 9 HA Hawaiian Airlines Inc.
#> 10 MQ Envoy Air
dbClearResult(res)
#> [1] TRUE
The following illustrates how you can make use of the dplyr interface. Again, we do not attach the RSQLServer
package.
library(dplyr, warn.conflicts = FALSE)
flights <- tbl(nycflights, "flights")
flights %>%
filter(carrier == "UA") %>%
arrange(-year, -month, -day, -dep_time)
#> Source: lazy query [?? x 19]
#> Database: SQLServerConnection
#> Ordered by: -year, -month, -day, -dep_time
#>
#> year month day dep_time sched_dep_time dep_delay arr_time
#> <int> <int> <int> <int> <int> <dbl> <int>
#> 1 2013 12 31 NA 1000 NA NA
#> 2 2013 12 31 NA 840 NA NA
#> 3 2013 12 31 NA 754 NA NA
#> 4 2013 12 31 NA 2000 NA NA
#> 5 2013 12 31 NA 1500 NA NA
#> 6 2013 12 31 NA 1430 NA NA
#> 7 2013 12 31 NA 855 NA NA
#> 8 2013 12 31 NA 705 NA NA
#> 9 2013 12 31 NA 600 NA NA
#> 10 2013 12 31 NA 830 NA NA
#> # ... with more rows, and 12 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>
collect(flights)
#> # A tibble: 336,776 × 19
#> year month day dep_time sched_dep_time dep_delay arr_time
#> * <int> <int> <int> <int> <int> <dbl> <int>
#> 1 2013 1 1 517 515 2 830
#> 2 2013 1 1 533 529 4 850
#> 3 2013 1 1 542 540 2 923
#> 4 2013 1 1 544 545 -1 1004
#> 5 2013 1 1 554 600 -6 812
#> 6 2013 1 1 554 558 -4 740
#> 7 2013 1 1 555 600 -5 913
#> 8 2013 1 1 557 600 -3 709
#> 9 2013 1 1 557 600 -3 838
#> 10 2013 1 1 558 600 -2 753
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> # minute <dbl>, time_hour <dttm>
Then close the connection
dbDisconnect(nycflights)
#> [1] TRUE