Author: Sebastian Kranz, Ulm University
The package dbmisc
contains some helper functions to work with databases, e.g. the functions dbGet
, dbInsert
, dbUpdate
and dbDelete
simplify common database operations. One main motivation for dbmisc
is to facilitate automatic type conversion between R and the database.
For this purpose you can specify a schema of your database as a simple YAML file. With the schema file you can also easily create or update database tables.
There is further functionality that allows automatic logs of database operations and memoization of fetched values. Basic usage is explaned in this README file.
dbmisc
is hosted on my own drat-powered R archive. To install it, run the following code:
install.packages("dbmisc",
repos = c("https://skranz-repo.github.io/drat/",getOption("repos")))
The example schema userdb.yaml specifies a database with just one table user
(here is an example with more than one table):
user:
table:
userid: TEXT
email: TEXT
age: INTEGER
female: BOOLEAN
created: DATETIME
descr: TEXT
unique_index:
- userid
index:
- email
- [female, age]
- created
Under the field table
, all columns of the table are specified using the variable types of the database.
The field unique_index
specifies that userid
is a unique index column, i.e. no two rows with duplicated userid
can exist in the database.
The field index
specifies three non-unique indices on the table. The second index [female, age]
is an index on two columns.
The following R code generates a new SQLite database from this schema in your current working directory:
schema.file = system.file("examples/dbschema/userdb.yaml", package="dbmisc")
db.dir = getwd()
dbCreateSQLiteFromSchema(schema.file=schema.file, db.dir=db.dir, db.name="userdb.sqlite")
You can take a look at the generated database with some software like https://sqlitestudio.pl.
If you want to update the schema for an existing data base you, can use the argument update=TRUE
:
dbCreateSQLiteFromSchema(db=db,schema.file=schema.file,update=TRUE)
Then the existing data is converted to the new schema. Newly added columns will be filled with NA values. If update=FALSE
, all existing data is deleted.
Of course, for safety reasons always make a backup of your database, before you modify it in this way.
It could be the case that you already have some data in R, e.g. from a CSV file, for which you want to generate a database table. To avoid typing the whole schema, you can use the little helper function schema.template
, which generates a skeleton of the yaml code for the schema and copies it to the clipboard. Consider the following code
df = data.frame(a=1:5,b="hi",c=Sys.Date(),d=Sys.time())
schema.template(df,"mytable")
It copies to your clipboard the following yaml output, which you can the manually adapt
mytable:
table:
a: INTEGER
b: TEXT
c: DATE
d: DATETIME
index:
- a # example index on first column
The functions dbGet
, dbInsert
, dbUpdate
and dbDelete
allow common database operations that can use a schema file to facilitate type conversion between R and the database. (So far only tested for SQLite).
The easiest way to use a schema file, is to open the connection and assign the schema with a single command:
db = dbConnectSQLiteWithSchema("userdb.sqlite", schema.file)
Alternatively, you could also assign a schema to an existing database connection with the command set.db.schema
db = set.db.schema(db, schema.file=schema.file)
I typically write for my shiny apps a little function to get the database connection, like:
get.userdb = function(db.dir=getwd()) {
db = getOption("userdb.connection")
if (is.null(db)) {
schema.file = system.file("schema/userdb.yaml",package = "shinyUserDB")
db.file = file.path(db.dir, "userdb.sqlite")
db = dbConnectSQLiteWithSchema(db.file,schema.file)
options(userdb.connection=db)
}
db
}
The function opens just a single database connection that is used by all app instances and stores it as a global option. If the connection is already open and the function is called again, the connection is retrieved from a global option. I am not sure whether there is any benefit from pooling more than one connection with SQLite.
The following example inserts an entry into our table user
:
new_user = list(created=Sys.time(), userid="user1",age=47, female=TRUE, email="test@email.com", gender="female")
dbInsert(db,table="user", new_user)
Recall that the table user
has been specified with the following columns
userid: TEXT
email: TEXT
age: INTEGER
female: BOOLEAN
created: DATETIME
descr: TEXT
Our R list differs in certain aspects from the table:
i) the order of fields is not the same as in the database table,
ii) we have not specified the column descr
iii) we have an additional value gender
that is not part of the database.
Using the schema, the function dbInsert
conveniently corrects for these differences:
i) orders the values in the right order,
ii) it adds a value descr
filled set to NA, and
iii) it removes gender
.
This 'autocorrection' allows to avoid some boilerplate code when performing database operations.
The function dbInsert
also performs some data type conversions that seem not automatically performed by the functions in the DBI
interfaces, e.g. it sets the POSIXct
variable created
to a DATETIME format that can be stored retrieved from the SQLite database (as SQLite does not really have a DATETIME format it is stored as a floating point number).
You can also pass a data frame to dbInsert
in order to insert multiple rows at once.
If you set the argument run=FALSE
, dbInsert
performs no data base action but just returns the SQL statement that would be run:
dbInsert(db,table="user", new_user,run = FALSE)
## [1] "insert into user values (:userid, :email, :age, :female, :created, :descr)"
Note that dbmisc
prepares by default parametrized queries to avoid SQL-injections.
The function dbGet
retrieves data from the database. E.g. the command
dat = dbGet(db,table="user", list(userid="user1"))
returns a data frame with one row in which userid
is equal to "user1". Again types are converted to standard R formats. For example, SQLite stores BOOLEANS internally as INTEGER, but based on the schema, dbGet
will correctly convert the variable female
to a logical
variable in R. Also DATETIME variables will be correctly converted to POSIXct
.
If you set the argument run=FALSE
, you can get the SQL query that dbGet
runs:
dbGet(db,table="user", list(userid="user1"), run = FALSE)
## [1] "SELECT * FROM user WHERE userid = :userid"
The dbGet
command allows also for more flexible queries. For example, one can specify selective fields with the argument fields
. One can also specify multiple tables joint by the columns specified in joinby
:
dbGet(db,c("course","coursestud"), list(courseid="course1"),
fields = "*, coursestud.email", joinby=c("courseid"), run = FALSE)
## [1] "SELECT *, coursestud.email FROM course INNER JOIN coursestud USING(courseid) WHERE courseid = :courseid"
You can also provide a custom SQL command as the argument sql
:
sql = "SELECT *, coursestud.email FROM course INNER JOIN coursestud USING(courseid) WHERE courseid = :courseid"
dbGet(db,table=c("course","coursestud"),list(courseid="course1"),sql=sql)
Even for a custom SQL command you can provide one or multiple tables to use the associated schemas when converting data types and parameters that will be used in the parametrized query.
The function dbUpdate and dbDelete are similar helper functions to update or delete data sets.
The functions dbInsert
, dbUpdate
and dbDelete
also have an argument log.dir
. If provided each call adds an entry to a simple log file that allows to check when some modifications of the database took place.
The following functionality was useful for some of my shiny apps.
The function dbGetMemoise
buffers database results in memory. By default, if you call the function again with the same parameters it will get the results from memory.
There is also an argument refetch.if.changed
which is by default TRUE if and only if you provide a non-null argument log.dir
. When set to TRUE dbGetMemoise
will load the data again from the database if the log file has changed.