Before we begin, you will need to install these packages
install.packages("jsonlite")
install.packages("rvest")
install.packages("DBI")
install.packages("RSQLite")
install.packages("dbplyr")
install.packages("pollstR")
Now we load a few R packages
Today we are going to talk about getting data, examples of common data formats, and useful tools to access data.
First let’s have a bit of a philosophical discussion about data.
As data analysts, this is what we wished data looked like whenever we start a project
However, the reality, is data is rarely in that form in comes in all types of “raw” formats that need to be transformed into a “clean” format.
For example, in field of genomics, raw data looks like something like this:
Or if you are interested in analyzing data from Twitter:
Or data from Electronic Healthcare Records (EHRs):
We all have our scary spreadsheet tales. Here is Jenny Bryan from RStudio and UBC actually asking for some of those spreasheet tales on twitter.
For example, this is an actual spreadsheet from Enron in 2001:
From https://simplystatistics.org/2016/07/20/relativity-raw-data/ raw data is defined as data…
…if you have done no processing, manipulation, coding, or analysis of the data. In other words, the file you received from the person before you is untouched. But it may not be the rawest version of the data. The person who gave you the raw data may have done some computations. They have a different “raw data set”.
Data lives anywhere and everywhere. Data might be stored simply in a .csv
or .txt
file. Data might be stored in an Excel or Google Spreadsheet. Data might be stored in large databases that require users to write special functions to interact with to extract the data they are interested in.
For example, you may have heard of the terms mySQL
or MongoDB
.
From Wikipedia, MySQL is defined as an open-source relational database management system (RDBMS). Its name is a combination of “My”, the name of co-founder Michael Widenius’s daughter,[7] and “SQL”, the abbreviation for Structured Query Language..
From Wikipeda, MongoDB is defined as “a free and open-source cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemata.”
So after reading that, we get the sense that there are multiple ways large databases can be structured, data can be formatted and interacted with. In addition, we see that database programs (e.g. MySQL and MongoDB) can also interact with each other.
We will learn more about SQL
and JSON
in a bit.
A great article in PeerJ was written titled How to share data for collaboration, in which the authors describe a set of guidelines for sharing data:
We highlight the need to provide raw data to the statistician, the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician. With these guidelines we hope to avoid errors and delays in data analysis. the importance of consistent formatting, and the necessity of including all essential experimental information and pre-processing steps carried out to the statistician.
It’s a great paper that describes the information you should pass to a statistician to facilitate the most efficient and timely analysis. Specifically:
tidy
dataset, but possibly yes.md
, .txt
or Word file.First let’s talk about a few important things before we download any data.
When you are starting a data analysis, you have already learned about the use of .Rproj
files. When you open up a .Rproj
file, RStudio changes the path (location on your computer) to the .Rproj
location.
After opening up a .Rproj
file, you can test this by
When you open up someone else’s R code or analysis, you might also see the setwd()
function being used which explicitly tells R to change the absolute path or absolute location of which directory to move into.
For example, say I want to clone a GitHub repo from Roger, which has 100 R script files, and in every one of those files at the top is:
The problem is, if I want to use his code, I will need to go and hand-edit every single one of those paths (C:\Users\Roger\path\only\that\Roger\has
) to the path that I want to use on my computer or wherever I saved the folder on my computer (e.g. /Users/Stephanie/Documents/path/only/I/have
).
So instead of absolute paths:
A better idea is to use relative paths:
Within R, an even better idea is to use the here R package will recognize the top-level directory of a Git repo and supports building all paths relative to that. For more on project-oriented workflow suggestions, read this post from Jenny Bryan.
here
packageIn her post, she writes
“I suggest organizing each data analysis into a project: a folder on your computer that holds all the files relevant to that particular piece of work.”
Instead of using setwd()
at the top your .R
or .Rmd
file, she suggests:
.here
. Or, if you use RStudio and/or Git, those both leave characteristic files behind that will get the job done.here()
function from the here
package to build the path when you read or write a file. Create paths relative to the top-level directory.cd
to the correct folder first.Let’s test this out. We can use getwd()
to see our current working directory path and the files available using list.file()
## [1] "/Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/lectures"
## [1] "01-introduction-lecture.html"
## [2] "01-introduction-lecture.Rmd"
## [3] "03-elements-prinicples-success.pdf"
## [4] "04-gettingdata-api.html"
## [5] "04-gettingdata-api.Rmd"
## [6] "TypesOfQuestions.pdf"
OK so our current location is in the lectures
sub-folder of the 2019
course repository. Let’s try using the here
package.
## here() starts at /Users/shicks/Documents/github/teaching/jhu-advdatasci/2019
## [1] "_navbar.yml" "_site.yml"
## [3] "2019-advdatasci.Rproj" "additional.css"
## [5] "book" "data"
## [7] "discussion.html" "discussion.Rmd"
## [9] "homework.html" "homework.Rmd"
## [11] "homeworks" "imgs"
## [13] "index.html" "index.Rmd"
## [15] "lectures" "Makefile"
## [17] "projects.html" "projects.Rmd"
## [19] "R_functions.sh" "README.html"
## [21] "README.md" "resources.html"
## [23] "resources.Rmd" "site_libs"
## [25] "slide_functions.R" "styles.css"
## [27] "syllabus.html" "syllabus.Rmd"
## [1] "cameras.csv" "Chinook.sqlite"
Now we see that using the here::here()
function is a relative path (relative to the .Rproj
file in our 2019
repository. We also see there is a cameras.csv
file in the data
folder. Let’s read it into R with the readr
package.
## Parsed with column specification:
## cols(
## address = col_character(),
## direction = col_character(),
## street = col_character(),
## crossStreet = col_character(),
## intersection = col_character(),
## `Location 1` = col_character()
## )
## # A tibble: 80 x 6
## address direction street crossStreet intersection `Location 1`
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 GARRISON BL… E/B "Garris… Wabash Ave "Garrison \n … (39.341209, …
## 2 HILLEN ST &… W/B "Hillen… Forrest St "Hillen \n & … (39.29686, -…
## 3 EDMONDSON A… E/B "Edmons… Woodbridge… "Edmonson\n … (39.293453, …
## 4 YORK RD & G… S/B "York R… Gitting Ave "York Rd \n &… (39.370493, …
## 5 RUSSELL ST … S/B "Russel… Hamburg St "Russell\n &… (39.279819, …
## 6 S MARTIN LU… S/B "MLK Jr… Pratt St "MLK Jr. Blvd… (39.286027, …
## 7 ORLEANS ST … E/B Orleans Linwood Ave Orleans & L… (39.295866, …
## 8 E NORTHERN … W/B "Northe… Springlake… "Northern Pkw… (39.364311, …
## 9 W COLD SPRI… E/B "Cold S… Roland Ave "Cold Spring\… (39.343906, …
## 10 E NORTHERN … W/B "Northe… York Road "Northern Pkw… (39.365146, …
## # … with 70 more rows
We can also ask for the full paths for specific files
## [1] "/Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/data/cameras.csv"
If you want to download a file, one way to use the file.exists()
, dir.create()
and list.files()
functions.
file.exists(here("my", "relative", "path"))
= logical test if the file existsdir.create(here("my", "relative", "path"))
= create a folderlist.files(here("my", "relative", "path"))
= list contents of folderLet’s say we wanted to find out where are all the Fixed Speed Cameras in Baltimore?
To do this, we can use the Open Baltimore API which has information on the locations of fixed speed cameras in Baltimore.
In case you aren’t familiar with fixed speed cameras, the website states:
Motorists who drive aggressively and exceed the posted speed limit by at least 12 miles per hour will receive $40 citations in the mail. These citations are not reported to insurance companies and no license points are assigned. Notification signs will be placed at all speed enforcement locations so that motorists will be aware that they are approaching a speed check zone. The goal of the program is to make the streets of Baltimore safer for everyone by changing aggressive driving behavior. In addition to the eight portable speed enforcement units, the city has retrofitted 50 red light camera locations with the automated speed enforcement technology.
When we go to the website, we see that the data can be provided to us as a .csv
file. To download in this data, we can do the following:
file_url <- paste0("https://data.baltimorecity.gov/api/",
"views/dz54-2aru/rows.csv?accessType=DOWNLOAD")
download.file(file_url,
destfile=here("data", "cameras.csv"))
list.files(here("data"))
Alternatively, if we want to only download the file once each time we knit our reproducible report or homework or project, we can us wrap the code above into a !file.exists()
function.
if(!file.exists(here("data", "cameras.csv"))){
file_url <- paste0("https://data.baltimorecity.gov/api/",
"views/dz54-2aru/rows.csv?accessType=DOWNLOAD")
download.file(file_url,
destfile=here("data", "cameras.csv"))
}
list.files(here("data"))
## [1] "cameras.csv" "Chinook.sqlite"
From there, we can read in the cameras.csv
like we have already learned how to do using the readr::read_csv()
function:
## Parsed with column specification:
## cols(
## address = col_character(),
## direction = col_character(),
## street = col_character(),
## crossStreet = col_character(),
## intersection = col_character(),
## `Location 1` = col_character()
## )
## # A tibble: 80 x 6
## address direction street crossStreet intersection `Location 1`
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 GARRISON BL… E/B "Garris… Wabash Ave "Garrison \n … (39.341209, …
## 2 HILLEN ST &… W/B "Hillen… Forrest St "Hillen \n & … (39.29686, -…
## 3 EDMONDSON A… E/B "Edmons… Woodbridge… "Edmonson\n … (39.293453, …
## 4 YORK RD & G… S/B "York R… Gitting Ave "York Rd \n &… (39.370493, …
## 5 RUSSELL ST … S/B "Russel… Hamburg St "Russell\n &… (39.279819, …
## 6 S MARTIN LU… S/B "MLK Jr… Pratt St "MLK Jr. Blvd… (39.286027, …
## 7 ORLEANS ST … E/B Orleans Linwood Ave Orleans & L… (39.295866, …
## 8 E NORTHERN … W/B "Northe… Springlake… "Northern Pkw… (39.364311, …
## 9 W COLD SPRI… E/B "Cold S… Roland Ave "Cold Spring\… (39.343906, …
## 10 E NORTHERN … W/B "Northe… York Road "Northern Pkw… (39.365146, …
## # … with 70 more rows
jsonlite
JSON (or JavaScript Object Notation) is a file format that stores information in human-readable, organized, logical, easy-to-access manner.
For example, here is what a JSON file looks like:
var stephanie = {
"age" : "33",
"hometown" : "Baltimore, MD",
"gender" : "female",
"cars" : {
"car1" : "Hyundai Elantra",
"car2" : "Toyota Rav4",
"car3" : "Honda CR-V"
}
}
Some features about JSON
object:
{}
Let’s say we want to use the GitHub API to find out how many of my GitHub repositories have open issues?
We will use the jsonlite R package and the fromJSON()
function to convert from a JSON object to a data frame.
We will read in a JSON file located at https://api.github.com/users/stephaniehicks/repos
github_url = "https://api.github.com/users/stephaniehicks/repos"
library(jsonlite)
jsonData <- fromJSON(github_url)
The function fromJSON()
has now converted the JSON file into a data frame with the names:
## [1] "id" "node_id" "name"
## [4] "full_name" "private" "owner"
## [7] "html_url" "description" "fork"
## [10] "url" "forks_url" "keys_url"
## [13] "collaborators_url" "teams_url" "hooks_url"
## [16] "issue_events_url" "events_url" "assignees_url"
## [19] "branches_url" "tags_url" "blobs_url"
## [22] "git_tags_url" "git_refs_url" "trees_url"
## [25] "statuses_url" "languages_url" "stargazers_url"
## [28] "contributors_url" "subscribers_url" "subscription_url"
## [31] "commits_url" "git_commits_url" "comments_url"
## [34] "issue_comment_url" "contents_url" "compare_url"
## [37] "merges_url" "archive_url" "downloads_url"
## [40] "issues_url" "pulls_url" "milestones_url"
## [43] "notifications_url" "labels_url" "releases_url"
## [46] "deployments_url" "created_at" "updated_at"
## [49] "pushed_at" "git_url" "ssh_url"
## [52] "clone_url" "svn_url" "homepage"
## [55] "size" "stargazers_count" "watchers_count"
## [58] "language" "has_issues" "has_projects"
## [61] "has_downloads" "has_wiki" "has_pages"
## [64] "forks_count" "mirror_url" "archived"
## [67] "disabled" "open_issues_count" "license"
## [70] "forks" "open_issues" "watchers"
## [73] "default_branch"
How many are private repos? How many have forks?
##
## FALSE
## 30
##
## 0 1 2 3 4 5 8
## 20 4 2 1 1 1 1
What’s the most popular language?
##
## HTML JavaScript R Ruby Shell TeX
## 7 1 11 2 1 2
To find out how many repos that I have with open issues, we can just create a table:
##
## 0 1
## 28 2
Whew! Not as many as I thought.
How many do you have?
Finally, I will leave you with a few other examples of using GitHub API:
rvest
Do we want to purchase a book on Amazon?
Next we are going to learn about what to do if your data is on a website (XML or HTML) formatted to be read by humans instead of R.
We will use the (really powerful) rvest R package to do what is often called “scraping data from the web”.
Before we do that, we need to set up a few things:
We’re going to be scraping this page: it just contains the (first page of) reviews of the ggplot2 book by Hadley Wickham.
We use the rvest
package to download this page.
Now h
is an xml_document
that contains the contents of the page:
## {xml_document}
## <html lang="en-us" class="a-no-js" data-19ax5a9jf="dingo">
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset= ...
## [2] <body>\n<span id="cr-state-object" data-state='{"asin":"0387981403", ...
How can you actually pull the interesting information out? That’s where CSS selectors come in.
CSS selectors are a way to specify a subset of nodes (that is, units of content) on a web page (e.g., just getting the titles of reviews). CSS selectors are very powerful and not too challenging to master- here’s a great tutorial But honestly you can get a lot done even with very little understanding, by using a tool called SelectorGadget.
Install the SelectorGadget on your web browser. (If you use Chrome you can use the Chrome extension, otherwise drag the provided link into your bookmarks bar). Here’s a guide for how to use it with rvest to “point-and-click” your way to a working selector.
For example, if you just wanted the titles, you’ll end up with a selector that looks something like .a-text-bold span
. You can pipe your HTML object along with that selector into the html_nodes
function, to select just those nodes:
## {xml_nodeset (10)}
## [1] <span class="">Must-have reference for R graphics</span>
## [2] <span class="">Still a great package and highly worth learning - bu ...
## [3] <span class="">Excellent</span>
## [4] <span class="">Nice resource, but already out of date</span>
## [5] <span class="">The best guide to the best graphics (I think) out th ...
## [6] <span class="">Graphing in R</span>
## [7] <span class="">Excellent content, poor adaptation to kindle</span>
## [8] <span class="">Excellent R resource for the Kindle</span>
## [9] <span class="">Great book, outdated</span>
## [10] <span class="">Indispensable resource for ggplot2 users</span>
But you need the text from each of these, not the full tags. Pipe to the html_text
function to pull these out:
## [1] "Must-have reference for R graphics"
## [2] "Still a great package and highly worth learning - but the text is getting quite out of date."
## [3] "Excellent"
## [4] "Nice resource, but already out of date"
## [5] "The best guide to the best graphics (I think) out there."
## [6] "Graphing in R"
## [7] "Excellent content, poor adaptation to kindle"
## [8] "Excellent R resource for the Kindle"
## [9] "Great book, outdated"
## [10] "Indispensable resource for ggplot2 users"
Now we’ve extracted something useful! Similarly, let’s grab the format (hardcover or paperback). Some experimentation with SelectorGadget shows it’s:
## [1] "Format: Paperback" "Format: Paperback"
## [3] "Format: Kindle Edition" "Format: Paperback"
## [5] "Format: Paperback" "Format: Kindle Edition"
## [7] "Format: Kindle Edition" "Format: Kindle Edition"
## [9] "Format: Paperback" "Format: Paperback"
Now, we may be annoyed that it always starts with Format:
. Let’s introduce the stringr
package.
formats <- h %>%
html_nodes(".a-size-mini.a-color-secondary") %>%
html_text() %>%
stringr::str_replace("Format: ", "")
formats
## [1] "Paperback" "Paperback" "Kindle Edition" "Paperback"
## [5] "Paperback" "Kindle Edition" "Kindle Edition" "Kindle Edition"
## [9] "Paperback" "Paperback"
We could do similar exercise for extracting the number of stars and whether or not someone found a review useful. This would help us decide if we were interested in purchasing the book!
SQLite
databaseAnother important type of data you might interact with are databases (such as SQL
or SQLite
). There are several ways to query databases in R.
First, we will download a .sqlite
database. This is a portable version of a SQL
database. For our purposes, we will use the chinook sqlite database here. The database represents a “digital media store, including tables for artists, albums, media tracks, invoices and customers”.
From the Readme.md file:
Sample Data
Media related data was created using real data from an iTunes Library. It is possible for you to use your own iTunes Library to generate the SQL scripts, see instructions below. Customer and employee information was manually created using fictitious names, addresses that can be located on Google maps, and other well formatted data (phone, fax, email, etc.). Sales information is auto generated using random data for a four year period.
if(!file.exists(here("data", "Chinook.sqlite"))){
file_url <- paste0("https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite")
download.file(file_url,
destfile=here("data", "Chinook.sqlite"))
}
list.files(here("data"))
## [1] "cameras.csv" "Chinook.sqlite"
The main workhorse packages that we will use are the DBI
and dplyr
packages. Let’s look at the DBI::dbConnect()
help file
So we need a driver and one example is RSQLite::SQLite()
. Let’s look at the help file
Ok so with RSQLite::SQLite()
and DBI::dbConnect()
we can connect to a SQLite
database. Let’s try that with our Chinook.sqlite
file that we downloaded. Chinook.sqlite
## <SQLiteConnection>
## Path: /Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/data/Chinook.sqlite
## Extensions: TRUE
So we have opened up a connection with the SQLite database. Next, we can see what tables are available in the database using the dbListTables()
function:
## [1] "Album" "Artist" "Customer" "Employee"
## [5] "Genre" "Invoice" "InvoiceLine" "MediaType"
## [9] "Playlist" "PlaylistTrack" "Track"
From RStudio’s website, there are several ways to interact with SQL Databases. One of the simplest ways that we will use here is to leverage the dplyr
framework.
"The
dplyr
package now has a generalized SQL backend for talking to databases, and the newdbplyr
package translates R code into database-specific variants. As of this writing, SQL variants are supported for the following databases: Oracle, Microsoft SQL Server, PostgreSQL, Amazon Redshift, Apache Hive, and Apache Impala. More will follow over time.
So if we want to query a SQL databse with dplyr, the benefit of using
dbplyr` is:
"You can write your code in
dplyr
syntax, anddplyr
will translate your code into SQL. There are several benefits to writing queries indplyr
syntax: you can keep the same consistent language both for R objects and database tables, no knowledge of SQL or the specific SQL variant is required, and you can take advantage of the fact thatdplyr
uses lazy evaluation.
Let’s take a closer look at the conn
database that we just connected to:
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
## src: sqlite 3.29.0 [/Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/data/Chinook.sqlite]
## tbls: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine,
## MediaType, Playlist, PlaylistTrack, Track
You can think of the multiple tables similar to having multiple worksheets in a spreadsheet.
Let’s try interacting with one.
dplyr
syntaxFirst, let’s look at the first ten rows in the Album
table.
## # Source: lazy query [?? x 3]
## # Database: sqlite 3.29.0
## # [/Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/data/Chinook.sqlite]
## AlbumId Title ArtistId
## <int> <chr> <int>
## 1 1 For Those About To Rock We Salute You 1
## 2 2 Balls to the Wall 2
## 3 3 Restless and Wild 2
## 4 4 Let There Be Rock 1
## 5 5 Big Ones 3
## 6 6 Jagged Little Pill 4
## 7 7 Facelift 5
## 8 8 Warner 25 Anos 6
## 9 9 Plays Metallica By Four Cellos 7
## 10 10 Audioslave 8
The output looks just like a data.frame
that we are familiar with. But it’s important to know that it’s not really a dataframe. For example, what about if we use the dim()
function?
## [1] NA 3
Interesting! We see that the number of rows returned is NA
. This is because these functions are different than operating on datasets in memory (e.g. loading data into memory using read_csv()
). Instead, dplyr
communicates differently with a SQLite database.
Let’s consider our example. If we were to use straight SQL, the following SQL query returns the first 10 rows from the Album
table:
In the background, dplyr
does the following:
To better understand the dplyr
code, we can use the show_query()
function:
## <SQL>
## SELECT *
## FROM `Album`
## LIMIT 10
This is nice because instead of having to write the SQL query ourself, we can just use the dplyr
and R syntax that we are used to.
However, the downside is that dplyr
never gets to see the full Album
table. It only sends our query to the database, waits for a response and returns the query. However, in this way we can interact with large datasets!
Many of the usual dplyr
functions are available too:
select()
filter()
summarize()
and many join functions.
Ok let’s try some of the functions out. First, let’s count how many albums each artist has made.
## # Source: lazy query [?? x 2]
## # Database: sqlite 3.29.0
## # [/Users/shicks/Documents/github/teaching/jhu-advdatasci/2019/data/Chinook.sqlite]
## ArtistId n
## <int> <int>
## 1 1 2
## 2 2 2
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 2
## 7 7 1
## 8 8 3
## 9 9 1
## 10 10 1
Next, let’s plot it.
tbl(conn, "Album") %>%
group_by(ArtistId) %>%
summarize(n = count(ArtistId)) %>%
arrange(desc(n)) %>%
ggplot(aes(x = ArtistId, y = n)) +
geom_bar(stat = "identity")
Let’s also extract the first letter from each album and plot the frequency of each letter.
The Huffington Post has an API which provides US opinion poll data on various political races and other non-political opinion polls.
There is an R package called pollstR
which provides an easy user interface.
For example, the API has data on the Trump Job Approval
Here we use the pollster_charts_polls()
function:
## Parsed with column specification:
## cols(
## Approve = col_double(),
## Disapprove = col_double(),
## Undecided = col_double(),
## poll_slug = col_character(),
## survey_house = col_character(),
## start_date = col_date(format = ""),
## end_date = col_date(format = ""),
## question_text = col_character(),
## sample_subpopulation = col_character(),
## observations = col_double(),
## margin_of_error = col_double(),
## mode = col_character(),
## partisanship = col_character(),
## partisan_affiliation = col_character()
## )
We can see what’s in the object:
## [1] "content" "url" "response"
The url
links to the data itself
## [1] "https://elections.huffingtonpost.com/pollster/api/v2/charts/trump-job-approval/pollster-chart-poll-questions.tsv"
The content
contains the polling data:
## # A tibble: 970 x 14
## Approve Disapprove Undecided poll_slug survey_house start_date
## <dbl> <dbl> <dbl> <chr> <chr> <date>
## 1 49 50 NA rasmusse… Rasmussen 2018-12-03
## 2 43 48 8 yougov-e… YouGov/Econ… 2018-12-02
## 3 41 52 6 ipsos-re… Ipsos/Reute… 2018-11-28
## 4 48 50 NA rasmusse… Rasmussen 2018-11-28
## 5 40 56 NA gallup-2… Gallup 2018-11-26
## 6 40 52 9 yougov-e… YouGov/Econ… 2018-11-25
## 7 48 50 NA rasmusse… Rasmussen 2018-11-25
## 8 43 45 12 grinnell… Grinnell/Se… 2018-11-24
## 9 44 51 6 ipsos-re… Ipsos/Reute… 2018-11-21
## 10 38 60 NA gallup-2… Gallup 2018-11-18
## # … with 960 more rows, and 8 more variables: end_date <date>,
## # question_text <chr>, sample_subpopulation <chr>, observations <dbl>,
## # margin_of_error <dbl>, mode <chr>, partisanship <chr>,
## # partisan_affiliation <chr>
This might be useful if you were ever interested in using polling data.
jsonlite
for JSON (e.g. GitHub API)rvest
to grab all the exact elements you want (e.g. book reviews)
DBI
, RSQLite
, dbplyr
for interacting with SQLite
databseshttr
for tools to work with URLs and HTTPgooglesheets
to interact with Google Sheets in Rgoogledrive
](https://googledrive.tidyverse.org](http://googledrive.tidyverse.org/) to interact with your Google Drive