R has developed a special representation of dates and times

  • Dates are represented by the Date class

  • Times are represented by the POSIXct or the POSIXlt class

  • Dates are stored internally as the number of days since 1970-01-01

  • Times are stored internally as the number of seconds since 1970-01-01

The lubridate package

  • The lubridate package is a very useful package for dealing with all the little annoying aspects of dates/times

  • Largely replaces the default date/time functions in base R

  • Methods for date/time arithmetic

  • Handles time zones, leap year, leap seconds, etc.

install.packages("lubridate")
## Not part of `tidyverse` package

Dates in R

Dates are represented by the Date class and can be coerced from a character string using the ymd() function.

library(lubridate)
x <- ymd("1970-01-01")
x
[1] "1970-01-01"
class(x)
[1] "Date"
unclass(x)
[1] 0
x <- ymd("2019-10-03")
unclass(x)
[1] 18172

Date objects have their own special print methods that will always format as “YYYY-MM-DD”.

Alternate Formulations

Different locales have different ways formatting dates

ymd("2016-09-13")  ## International standard
[1] "2016-09-13"
ymd("2016/09/13")  ## Just figure it out
[1] "2016-09-13"
mdy("09-13-2016")  ## Mostly U.S.
[1] "2016-09-13"
dmy("13-09-2016")  ## Europe
[1] "2016-09-13"

All of the above are valid and lead to the exact same object.

Even if the individual dates are formatted differently, ymd() can usually figure it out.

x <- c("2016-04-05", 
       "2016/05/06",
       "2016,10,4")
ymd(x)
[1] "2016-04-05" "2016-05-06" "2016-10-04"

Times in R

Times are represented using the POSIXct or the POSIXlt class

  • POSIXct is just a very large integer under the hood; it is a useful class when you want to store times in something like a data frame

  • POSIXlt is a list underneath and it stores a bunch of other useful information like the day of the week, day of the year, month, day of the month

Times are represented as the number of seconds since 1970-01-01 00:00:00.

x <- ymd_hms("2019-10-03 13:30:00")
class(x)
[1] "POSIXct" "POSIXt" 
unclass(x)
[1] 1570109400
attr(,"tzone")
[1] "UTC"

If you want to know more about the international date/time standard, you can read about ISO Standard 8601.

Inputing Time Data

Times can be coerced from a character string with ymd_hms()

ymd_hms("2016-09-13 14:00:00")
[1] "2016-09-13 14:00:00 UTC"
ymd_hms("2016-09-13 14:00:00", tz = "America/New_York")
[1] "2016-09-13 14:00:00 EDT"
ymd_hms("2016-09-13 14:00:00", tz = "")
[1] "2016-09-13 14:00:00 EDT"

Time Zones!

Time zones were created to make your data analyses more difficult.

  • ymd_hms() function will by default use UTC as the time zone

  • Specifying tz = "" will use the local time zone

  • Better to specify time zone when possible to avoid ambiguity

You can go to Wikipedia to find the list of time zones

  • Daylight savings time

  • Some states are in two time zones

  • Southern hemisphere is opposite

Specifying Times in R

Finally, there is the strptime() function in case your dates are written in a different format

datestring <- c("January 10, 2012 10:40", 
                "December 9, 2011 9:10")
x <- strptime(datestring, "%B %d, %Y %H:%M", 
              tz = "America/Los_Angeles")
x
[1] "2012-01-10 10:40:00 PST" "2011-12-09 09:10:00 PST"
  • Check ?strptime for details of formatting strings

  • When reading in data with read_csv(), you may need to read in as character first and then convert to date/time

Operations on Dates and Times

Arithmetic

You can add and subtract dates and times. You can do comparisons too (i.e. ==, <=)

x <- ymd("2012-01-01", tz = "")  ## Midnight
y <- dmy_hms("9 Jan 2011 11:34:21", tz = "")
x - y
Time difference of 356.5178 days
x + y  ## Nope!
Error in `+.POSIXt`(x, y): binary '+' is not defined for "POSIXt" objects

Add a second to a time

y + 1
[1] "2011-01-09 11:34:22 EST"

Just keep the date portion

y <- date(y)
y
[1] "2011-01-09"

Add a number to the date (in this case 1 day)

y + 1  
[1] "2011-01-10"

Leaps and Bounds

Even keeps track of leap years, leap seconds, daylight savings, and time zones.

Leap years

x <- ymd("2012-03-01")
y <- ymd("2012-02-28")
x - y
Time difference of 2 days

Beware of time zones!

x <- ymd_hms("2012-10-25 01:00:00", tz = "")
y <- ymd_hms("2012-10-25 06:00:00", tz = "GMT")
y - x
Time difference of 1 hours

There are also leap seconds.

.leap.seconds
 [1] "1972-06-30 20:00:00 EDT" "1972-12-31 19:00:00 EST"
 [3] "1973-12-31 19:00:00 EST" "1974-12-31 19:00:00 EST"
 [5] "1975-12-31 19:00:00 EST" "1976-12-31 19:00:00 EST"
 [7] "1977-12-31 19:00:00 EST" "1978-12-31 19:00:00 EST"
 [9] "1979-12-31 19:00:00 EST" "1981-06-30 20:00:00 EDT"
[11] "1982-06-30 20:00:00 EDT" "1983-06-30 20:00:00 EDT"
[13] "1985-06-30 20:00:00 EDT" "1987-12-31 19:00:00 EST"
[15] "1989-12-31 19:00:00 EST" "1990-12-31 19:00:00 EST"
[17] "1992-06-30 20:00:00 EDT" "1993-06-30 20:00:00 EDT"
[19] "1994-06-30 20:00:00 EDT" "1995-12-31 19:00:00 EST"
[21] "1997-06-30 20:00:00 EDT" "1998-12-31 19:00:00 EST"
[23] "2005-12-31 19:00:00 EST" "2008-12-31 19:00:00 EST"
[25] "2012-06-30 20:00:00 EDT" "2015-06-30 20:00:00 EDT"
[27] "2016-12-31 19:00:00 EST"

Extracting Elements of Dates/Times

There are a set of helper functions in lubridate that can extract sub-elements of dates/times

Date Elements

x <- ymd_hms(c("2012-10-25 01:13:46",
               "2015-04-23 15:11:23"), tz = "")
year(x)
[1] 2012 2015
month(x)
[1] 10  4
day(x)
[1] 25 23
weekdays(x)
[1] "Thursday" "Thursday"

Time Elements

x <- ymd_hms(c("2012-10-25 01:13:46",
               "2015-04-23 15:11:23"), tz = "")
minute(x)
[1] 13 11
second(x)
[1] 46 23
hour(x)
[1]  1 15
week(x)
[1] 43 17

Visualizing Dates

Reading in the Data

library(readr)
storm <- read_csv("../data/storm_events_2002.csv.gz", progress = FALSE)
names(storm)
 [1] "BEGIN_YEARMONTH"    "BEGIN_DAY"          "BEGIN_TIME"        
 [4] "END_YEARMONTH"      "END_DAY"            "END_TIME"          
 [7] "EPISODE_ID"         "EVENT_ID"           "STATE"             
[10] "STATE_FIPS"         "YEAR"               "MONTH_NAME"        
[13] "EVENT_TYPE"         "CZ_TYPE"            "CZ_FIPS"           
[16] "CZ_NAME"            "WFO"                "BEGIN_DATE_TIME"   
[19] "CZ_TIMEZONE"        "END_DATE_TIME"      "INJURIES_DIRECT"   
[22] "INJURIES_INDIRECT"  "DEATHS_DIRECT"      "DEATHS_INDIRECT"   
[25] "DAMAGE_PROPERTY"    "DAMAGE_CROPS"       "SOURCE"            
[28] "MAGNITUDE"          "MAGNITUDE_TYPE"     "FLOOD_CAUSE"       
[31] "CATEGORY"           "TOR_F_SCALE"        "TOR_LENGTH"        
[34] "TOR_WIDTH"          "TOR_OTHER_WFO"      "TOR_OTHER_CZ_STATE"
[37] "TOR_OTHER_CZ_FIPS"  "TOR_OTHER_CZ_NAME"  "BEGIN_RANGE"       
[40] "BEGIN_AZIMUTH"      "BEGIN_LOCATION"     "END_RANGE"         
[43] "END_AZIMUTH"        "END_LOCATION"       "BEGIN_LAT"         
[46] "BEGIN_LON"          "END_LAT"            "END_LON"           
[49] "EPISODE_NARRATIVE"  "EVENT_NARRATIVE"    "DATA_SOURCE"       

Let’s take a look at the BEGIN_DATE_TIME and DEATHS_DIRECT variables

library(dplyr)
select(storm, BEGIN_DATE_TIME, EVENT_TYPE, DEATHS_DIRECT)
# A tibble: 52,956 x 3
  BEGIN_DATE_TIME    EVENT_TYPE               DEATHS_DIRECT
  <chr>              <chr>                            <int>
1 03-JUL-03 21:30:00 Thunderstorm Wind                    0
2 04-JUL-03 08:35:00 Marine Thunderstorm Wind             0
3 04-JUL-03 08:35:00 Marine Thunderstorm Wind             0
4 11-AUG-03 16:33:00 Thunderstorm Wind                    0
5 11-AUG-03 18:00:00 Hail                                 0
# ... with 5.295e+04 more rows

We can first convert the date/time to a date/time R object.

storm_sub <- select(storm, BEGIN_DATE_TIME, EVENT_TYPE, DEATHS_DIRECT) %>%
  mutate(begin = dmy_hms(BEGIN_DATE_TIME)) %>%
  rename(type = EVENT_TYPE,
         deaths = DEATHS_DIRECT) %>%
  select(begin, type, deaths)
storm_sub
# A tibble: 52,956 x 3
  begin               type                     deaths
  <dttm>              <chr>                     <int>
1 2003-07-03 21:30:00 Thunderstorm Wind             0
2 2003-07-04 08:35:00 Marine Thunderstorm Wind      0
3 2003-07-04 08:35:00 Marine Thunderstorm Wind      0
4 2003-08-11 16:33:00 Thunderstorm Wind             0
5 2003-08-11 18:00:00 Hail                          0
# ... with 5.295e+04 more rows

Histograms of Dates/Times

We can make a histogram of the dates/times to get a sense of when storm events occur.

library(ggplot2)
storm_sub %>%
  ggplot(aes(x = begin)) + 
  geom_histogram(bins = 20) + 
  theme_bw()

We can group by event type too.

library(ggplot2)
storm_sub %>%
  ggplot(aes(x = begin)) + 
  facet_wrap(~ type) + 
  geom_histogram(bins = 20) + 
  theme_bw() + 
  theme(axis.text.x.bottom = element_text(angle = 90))

Scatterplots of Dates/Times

storm_sub %>%
  ggplot(aes(begin, deaths)) + 
  geom_point()

If we focus on a single month, the x-axis adapts.

storm_sub %>%
  filter(month(begin) == 6) %>%
  ggplot(aes(begin, deaths)) + 
  geom_point()

Similarly, we can focus on a single day.

storm_sub %>%
  filter(month(begin) == 6, day(begin) == 16) %>%
  ggplot(aes(begin, deaths)) + 
  geom_point()

Summary

  • Dates and times have special classes in R that allow for numerical and statistical calculations

  • Dates use the Date class

  • Times use the POSIXct and POSIXlt class

  • Character strings can be coerced to Date/Time classes using the ymd() and ymd_hms() functions. In strange cases, you can use the strptime()or the as.Date() functions.

  • The lubridate package is essential for manipulating date/time data

  • Both plot and ggplot “know” about dates and times and will handle axis labels appropriately