Thursday, December 22, 2016

Introduction to Using R with MySQL

Background

A database is a set of related information. For example, a dictionary is a database of words in a in alphabetical order and their meaning, often also providing information about pronunciation, origin, and usage.
MySQL is a relational database which means that uses the Relational Model where data is represented as a set of tables and redundant data is used to link records from different tables.

Example of SQL relational model


A difference between MySQL (SQL in general) and R is that MySQL is a non-procedural language in the sense that in R you define your variables and data structures using loops, conditional logic, etc., splitting your code into modules (reusable pieces of code). When your R code is evaluated, it is processed in exactly the way you have written it. With SQL this is not the case, instead you define your inputs and outputs but the way in which your statements are evaluated are left to the optimizer.

Installing MySQL

As stated in the MySQL site, installation of MySQL generally follows the steps:

  • Determine whether MySQL runs and is supported on your platform
  • Choose which distribution to install
  • Download the distribution that you want to install
  • Install the distribution
  • Perform any necessary postinstallation setup
Installing MySQL in Ubuntu with apt-get
Update your package index, install the mysql-server package, and then run the included security and database initialization scripts.

$ sudo apt-get update
$ sudo apt-get install mysql-server
$ sudo mysql_secure_installation
$ sudo mysql_install_db
MySQL should have started running automatically. Check its status by

$ service mysql status
You should see some ouput showing the process PID number. If MySQL has not started, start the sever with the following command:

 $ sudo service mysql start
Installing MySQL on OS X using native packages
This is taken from the MySQL site, please check the site for more detailed information.

  • Download the disk image (.dmg) file (the community version is available here) that contains the MySQL package installer. Double-click the file to mount the disk image and see its contents.
  • Double-click the MySQL installer package. It will be named according to the version of MySQL you have downloaded
  • You will be presented with the opening installer dialog. Click Continue to begin installation
To start MySQL click on System Preferences -> MySQL -> Start MySQL Server, then you might be asked for your account password in order to start the service.

Creating a database

Login into MySQL though the terminal

$ mysql -u user -p
once logged in, create the database by using the following command (change the database name to anything you like)

mysql> create database testdb;
Query OK, 1 row affected (0.01 sec)
Now we will create an user for this database (change password 123 to something more secure for a production evironment!)

mysql> CREATE USER 'ruser'@'localhost' IDENTIFIED BY '1234';
Query OK, 0 rows affected (0.02 sec)
mysql> grant all privileges on testdb.* to 'ruser'@'localhost';
Query OK, 0 rows affected (0.02 sec)
Or alternatively you can do both things in one command:

mysql> grant all privileges on testdb.* to 'ruser'@'localhost' identified by '1234';
To exit MySQL client type quit;

Connecting to MySQL from R

In order to connect to MySQL we will use the packages:
  • DBI as a database interface for communicating between R and MySQL; and
  • RMySQL which is a MySQL driver implementing the R database (DBI) API. This class should always be initialized with the MySQL() function. It returns a singleton that allows you to connect to MySQL.
Both packages are found in CRAN.

library(DBI)
# connect to a database and load some data
con <- dbConnect(RMySQL::MySQL(),dbname = 'testdb', username = 'ruser',
password = '1234', host = 'localhost')

Creaing a table from a data frame

As stated in the gelp page for dbReadTable:

The function dbReadTable copy data frames to and from database tables
Description
dbReadTable: database table -> data frame; dbWriteTable: data frame -> database table
Usage
dbReadTable(conn, name, …)
dbWriteTable(conn, name, value, …)
Check the hep page for more information (?dbWriteTable). For example, let’s write the dataset USArrests from the datasets package into a table:

dbWriteTable(con, "USArrests", datasets::USArrests, overwrite = TRUE)
## [1] TRUE

Sending a query to a table(s)

Sending a quiery to a database table can be done with the dbSendQuery function, which submits and synchronously executes the SQL query to the database engine. It does not extract any records! For that you need to use the function dbFetch. This function is for SELECT queries only, it returns a DBIResult object.
# query
rs <- dbSendQuery(con, "SELECT * FROM USArrests")

Feching records from a (DBIResult) result object

When you send a query using dbSendQuery, the returned object is not a data frame holding the records, instead it returns a DBIResult object that can be used by dbFetch in order to extract these results.

d1 <- dbFetch(rs, n = 10)      # extract data in chunks of 10 rows

# This method returns if the operation has completed. A SELECT query 
# is completed if all rows have been fetched
dbHasCompleted(rs)
## [1] FALSE
head(d1)
##    row_names Murder Assault UrbanPop Rape
## 1    Alabama   13.2     236       58 21.2
## 2     Alaska   10.0     263       48 44.5
## 3    Arizona    8.1     294       80 31.0
## 4   Arkansas    8.8     190       50 19.5
## 5 California    9.0     276       91 40.6
## 6   Colorado    7.9     204       78 38.7
d2 <- dbFetch(rs, n = -1)      # extract all remaining data
dbHasCompleted(rs)
## [1] TRUE
# Frees all resources (local and remote) associated with a result set.
dbClearResult(rs)
## [1] TRUE
# should, where possible, include temporary tables
dbListTables(con)
## [1] "USArrests"

Disconnecting from MySQL

Now to finish we will remove the database used for this example and will disconnect, you should always disconnect once you are done in order to reduce the number of open connections.

# clean up
dbRemoveTable(con, "USArrests") # delete the table of you no longer need it
## [1] TRUE
dbDisconnect(con) # always close the connection once you're finished!
## [1] TRUE

Resources

In order to learn more about MySQL here are some resources: