According to the SQLite
website:
People who understand SQL can employ the sqlite3 command-line shell (or various third-party SQLite access programs) to analyze large datasets. Raw data can be imported from CSV files, then that data can be sliced and diced to generate a myriad of summary reports. More complex analysis can be done using simple scripts written in Tcl or Python (both of which come with SQLite built-in) or in R or other languages using readily available adaptors. Possible uses include website log analysis, sports statistics analysis, compilation of programming metrics, and analysis of experimental results. Many bioinformatics researchers use SQLite in this way.
To use SQLite from R, first we will start by loading the package DBI
library(DBI)
The dataset
We will use the the
faithful
dataset from the
datasets
package, the dataset has logs of the waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming, USA.
head(faithful)
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
Creating the database
To create a new SQLite database, just supply the filename to dbConnect():
handle <- dbConnect(RSQLite::SQLite(),"faithful.sqlite")
unlink("faithful.sqlite")
In case we don’t need the database on disk we can use either
""
for an on-disk database or
":memory:"
or
"file::memory:"
for a in-memory database. In the later case the database will be deleted after disconnecting.
Loading the data into SQLite
To save the data-frame into the database, use
dbWriteTable()
:
handle <- dbConnect(RSQLite::SQLite(),"")
dbListTables(handle)
## character(0)
dbWriteTable(handle,"faithful",faithful)
## [1] TRUE
dbListTables(handle)
## [1] "faithful"
Query the database
To query your database, use
dbGetQuery()
, this function is for SELECT queries only. For data manipulation statements use
dbExecute()
instead.
dbGetQuery(handle,
"SELECT *
FROM faithful WHERE waiting > 80 LIMIT 6")
## eruptions waiting
## 1 4.533 85
## 2 4.700 88
## 3 3.600 85
## 4 4.350 85
## 5 3.917 84
## 6 4.700 83
dbDisconnect(handle)
## [1] TRUE
That’s it!
References
If you want to know more about
SQLite and R
or
DBI
: