Data transformations

Lecture 11

2025-04-10

Introduction to dates and times

Prerequisites

  • This chapter will focus on the lubridate package, which makes it easier to work with dates and times in R.
  • As of the latest tidyverse release, lubridate is part of core tidyverse.
library(tidyverse)

Date & time

Date and time basics

There are three types of date/time data that refer to an instant in time:

  • A date. Tibbles print this as <date>.

  • A time within a day. Tibbles print this as <time>.

  • A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>.

Date and time basics

You should always use the simplest possible data type that works for your needs.

That means if you can use a date instead of a date-time, you should.

Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.

Current

To get the current date or date-time you can use today() or now():

today()
[1] "2025-04-10"
now()
[1] "2025-04-10 11:13:04 EDT"

Date-time importing

If your CSV contains an ISO8601 date or date-time, you don’t need to do anything; readr will automatically recognize it:

csv <- "
  date,datetime
  2022-01-02,2022-01-02 05:12
"
read_csv(csv)
Rows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dttm (1): datetime
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 2
  date       datetime           
  <date>     <dttm>             
1 2022-01-02 2022-01-02 05:12:00

ISO8601

If you haven’t heard of ISO8601 before, it’s an international standard1 for writing dates where the components of a date are organized from biggest to smallest separated by -.

For example, in ISO8601 May 3 2022 is 2022-05-03. ISO8601 dates can also include times, where hour, minute, and second are separated by :, and the date and time components are separated by either a T or a space.

For example, you could write 4:26pm on May 3 2022 as either 2022-05-03 16:26 or 2022-05-03T16:26.

Other formats

For other date-time formats, you’ll need to use col_types plus col_date() or col_datetime() along with a date-time format. The date-time format used by readr is a standard used across many programming languages, describing a date component with a % followed by a single character. For example, %Y-%m-%d specifies a date that’s a year, -, month (as number) -, day.

Other formats

Type Code Meaning Example
Year %Y 4 digit year 2021
%y 2 digit year 21
Month %m Number 2
%b Abbreviated name Feb
%B Full name February
Day %d One or two digits 2
%e Two digits 02

Other formats

Type Code Meaning Example
Time %H 24-hour hour 13
%I 12-hour hour 1
%p AM/PM pm
%M Minutes 35
%S Seconds 45
%OS Seconds with decimal component 45.35
%Z Time zone name America/Chicago
%z Offset from UTC +0800
Other %. Skip one non-digit :
%* Skip any number of non-digits

Datetime exmaple

csv <- "
  date
  01/02/15
"

read_csv(csv, col_types = cols(date = col_date("%m/%d/%y")))
# A tibble: 1 × 1
  date      
  <date>    
1 2015-01-02
read_csv(csv, col_types = cols(date = col_date("%d/%m/%y")))
# A tibble: 1 × 1
  date      
  <date>    
1 2015-02-01
read_csv(csv, col_types = cols(date = col_date("%y/%m/%d")))
# A tibble: 1 × 1
  date      
  <date>    
1 2001-02-15

Date-time components

Getting components

You can pull out individual parts of the date with the accessor functions year(), month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), and second(). These are effectively the opposites of make_datetime().

Getting components

datetime <- ymd_hms("2026-07-08 12:34:56")

year(datetime)
[1] 2026
month(datetime)
[1] 7
mday(datetime)
[1] 8
yday(datetime)
[1] 189
wday(datetime)
[1] 4

Getting components

For month() and wday() you can set label = TRUE to return the abbreviated name of the month or day of the week. Set abbr = FALSE to return the full name.

month(datetime, label = TRUE)
[1] Jul
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(datetime, label = TRUE, abbr = FALSE)
[1] Wednesday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Datetime rounding

An alternative approach to plotting individual components is to round the date to a nearby unit of time, with floor_date(), round_date(), and ceiling_date(). Each function takes a vector of dates to adjust and then the name of the unit to round down (floor), round up (ceiling), or round to.

week = floor_date(dep_time, "week")

Datetime modifying

You can also use each accessor function to modify the components of a date/time. This doesn’t come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.

(datetime <- ymd_hms("2026-07-08 12:34:56"))
[1] "2026-07-08 12:34:56 UTC"
year(datetime) <- 2030
datetime
[1] "2030-07-08 12:34:56 UTC"
month(datetime) <- 01
datetime
[1] "2030-01-08 12:34:56 UTC"
hour(datetime) <- hour(datetime) + 1
datetime
[1] "2030-01-08 13:34:56 UTC"

Datetime updating

You can also use each accessor function to modify the components of a date/time. This doesn’t come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.

update(datetime, year = 2030, month = 2, mday = 2, hour = 2)
[1] "2030-02-02 02:34:56 UTC"

Time spans

Time span

Next you’ll learn about how arithmetic with dates works, including subtraction, addition, and division. Along the way, you’ll learn about three important classes that represent time spans:

  • Durations, which represent an exact number of seconds.
  • Periods, which represent human units like weeks and months.
  • Intervals, which represent a starting and ending point.

Time span

How do you pick between duration, periods, and intervals?

As always, pick the simplest data structure that solves your problem.

  • If you only care about physical time, use a duration;
  • If you need to add human times, use a period;
  • If you need to figure out how long a span is in human units, use an interval.

Durations

In R, when you subtract two dates, you get a difftime object:

# How old is Miel?
h_age <- today() - ymd("1982-04-27")
h_age
Time difference of 15689 days

Durations

A difftime class object records a time span of seconds, minutes, hours, days, or weeks.

This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the duration.

as.duration(h_age)
[1] "1355529600s (~42.95 years)"

Durations

Durations come with a bunch of convenient constructors:

dseconds(15)
[1] "15s"
dminutes(10)
[1] "600s (~10 minutes)"
dhours(c(12, 24))
[1] "43200s (~12 hours)" "86400s (~1 days)"  
ddays(0:5)
[1] "0s"                "86400s (~1 days)"  "172800s (~2 days)"
[4] "259200s (~3 days)" "345600s (~4 days)" "432000s (~5 days)"
dweeks(3)
[1] "1814400s (~3 weeks)"
dyears(1)
[1] "31557600s (~1 years)"

Durations

  • Durations always record the time span in seconds.

  • Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week.

  • Larger time units are more problematic.

  • A year uses the “average” number of days in a year, i.e. 365.25.

  • There’s no way to convert a month to a duration, because there’s just too much variation.

Add and multiply durations

You can add and multiply durations:

2 * dyears(1)
[1] "63115200s (~2 years)"
dyears(1) + dweeks(12) + dhours(15)
[1] "38869200s (~1.23 years)"

Add and substract durations

You can add and subtract durations to and from days:

tomorrow <- today() + ddays(1)
tomorrow
[1] "2025-04-11"
last_year <- today() - dyears(1)
last_year
[1] "2024-04-09 18:00:00 UTC"

Whatch for unexpected results

However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:

one_am <- ymd_hms("2026-03-08 01:00:00", tz = "America/New_York")

one_am
[1] "2026-03-08 01:00:00 EST"
one_am + ddays(1)
[1] "2026-03-09 02:00:00 EDT"

Why is one day after 1am March 8, 2am March 9? If you look carefully at the date you might also notice that the time zones have changed. March 8 only has 23 hours because it’s when DST starts, so if we add a full days worth of seconds we end up with a different time.

Periods

Periods

Lubridate provides periods solve this problem.

Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months.

That allows them to work in a more intuitive way:

one_am
[1] "2026-03-08 01:00:00 EST"
one_am + days(1)
[1] "2026-03-09 01:00:00 EDT"

Periods

Like durations, periods can be created with a number of friendly constructor functions.

hours(c(12, 24))
[1] "12H 0M 0S" "24H 0M 0S"
days(7)
[1] "7d 0H 0M 0S"
months(1:6)
[1] "1m 0d 0H 0M 0S" "2m 0d 0H 0M 0S" "3m 0d 0H 0M 0S" "4m 0d 0H 0M 0S"
[5] "5m 0d 0H 0M 0S" "6m 0d 0H 0M 0S"

Periods

You can add and multiply periods:

10 * (months(6) + days(1))
[1] "60m 10d 0H 0M 0S"
days(50) + hours(25) + minutes(2)
[1] "50d 25H 2M 0S"

Periods

And of course, add them to dates. Compared to durations, periods are more likely to do what you expect:

# A leap year
ymd("2024-01-01") + dyears(1)
[1] "2024-12-31 06:00:00 UTC"
ymd("2024-01-01") + years(1)
[1] "2025-01-01"
# Daylight saving time
one_am + ddays(1)
[1] "2026-03-09 02:00:00 EDT"
one_am + days(1)
[1] "2026-03-09 01:00:00 EDT"

Time zones

Time zones

Time zones are an enormously complicated topic because of their interaction with geopolitical entities. Fortunately we don’t need to dig into all the details as they’re not all important for data analysis, but there are a few challenges we’ll need to tackle head on.