Tuesday, November 29, 2016

Introduction to using SQLite with R


According to the SQLite website:
People who understand SQL can employ the sqlite3 command-line shell (or various third-party SQLite access programs) to analyze large datasets. Raw data can be imported from CSV files, then that data can be sliced and diced to generate a myriad of summary reports. More complex analysis can be done using simple scripts written in Tcl or Python (both of which come with SQLite built-in) or in R or other languages using readily available adaptors. Possible uses include website log analysis, sports statistics analysis, compilation of programming metrics, and analysis of experimental results. Many bioinformatics researchers use SQLite in this way.
To use SQLite from R, first we will start by loading the package DBI

library(DBI)

The dataset

We will use the the faithful dataset from the datasets package, the dataset has logs of the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.

head(faithful)
##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

Creating the database

To create a new SQLite database, just supply the filename to dbConnect():

handle <- dbConnect(RSQLite::SQLite(),"faithful.sqlite")
unlink("faithful.sqlite")
In case we don’t need the database on disk we can use either "" for an on-disk database or ":memory:" or "file::memory:" for a in-memory database. In the later case the database will be deleted after disconnecting.

Loading the data into SQLite

To save the data-frame into the database, use dbWriteTable():

handle <- dbConnect(RSQLite::SQLite(),"")
dbListTables(handle)
## character(0)
dbWriteTable(handle,"faithful",faithful)
## [1] TRUE
dbListTables(handle)
## [1] "faithful"

Query the database

To query your database, use dbGetQuery(), this function is for SELECT queries only. For data manipulation statements use dbExecute() instead.

dbGetQuery(handle,
           "SELECT * 
           FROM faithful WHERE waiting > 80 LIMIT 6")
##   eruptions waiting
## 1     4.533      85
## 2     4.700      88
## 3     3.600      85
## 4     4.350      85
## 5     3.917      84
## 6     4.700      83
dbDisconnect(handle)
## [1] TRUE
That’s it!

References

If you want to know more about SQLite and R or DBI :