Lecture 11
2025-04-10
There are three types of date/time data that refer to an instant in time:
A date. Tibbles print this as <date>
.
A time within a day. Tibbles print this as <time>
.
A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>
.
You should always use the simplest possible data type that works for your needs.
That means if you can use a date instead of a date-time, you should.
Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.
To get the current date or date-time you can use today()
or now()
:
If your CSV contains an ISO8601 date or date-time, you don’t need to do anything; readr will automatically recognize it:
Rows: 1 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dttm (1): datetime
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 2
date datetime
<date> <dttm>
1 2022-01-02 2022-01-02 05:12:00
If you haven’t heard of ISO8601 before, it’s an international standard1 for writing dates where the components of a date are organized from biggest to smallest separated by -
.
For example, in ISO8601 May 3 2022 is 2022-05-03
. ISO8601 dates can also include times, where hour, minute, and second are separated by :
, and the date and time components are separated by either a T
or a space.
For example, you could write 4:26pm on May 3 2022 as either 2022-05-03 16:26
or 2022-05-03T16:26
.
For other date-time formats, you’ll need to use col_types
plus col_date()
or col_datetime()
along with a date-time format. The date-time format used by readr is a standard used across many programming languages, describing a date component with a %
followed by a single character. For example, %Y-%m-%d
specifies a date that’s a year, -
, month (as number) -
, day.
Type | Code | Meaning | Example |
---|---|---|---|
Year | %Y |
4 digit year | 2021 |
%y |
2 digit year | 21 | |
Month | %m |
Number | 2 |
%b |
Abbreviated name | Feb | |
%B |
Full name | February | |
Day | %d |
One or two digits | 2 |
%e |
Two digits | 02 |
Type | Code | Meaning | Example |
---|---|---|---|
Time | %H |
24-hour hour | 13 |
%I |
12-hour hour | 1 | |
%p |
AM/PM | pm | |
%M |
Minutes | 35 | |
%S |
Seconds | 45 | |
%OS |
Seconds with decimal component | 45.35 | |
%Z |
Time zone name | America/Chicago | |
%z |
Offset from UTC | +0800 | |
Other | %. |
Skip one non-digit | : |
%* |
Skip any number of non-digits |
# A tibble: 1 × 1
date
<date>
1 2015-01-02
# A tibble: 1 × 1
date
<date>
1 2015-02-01
# A tibble: 1 × 1
date
<date>
1 2001-02-15
You can pull out individual parts of the date with the accessor functions year()
, month()
, mday()
(day of the month), yday()
(day of the year), wday()
(day of the week), hour()
, minute()
, and second()
. These are effectively the opposites of make_datetime()
.
For month()
and wday()
you can set label = TRUE
to return the abbreviated name of the month or day of the week. Set abbr = FALSE
to return the full name.
An alternative approach to plotting individual components is to round the date to a nearby unit of time, with floor_date()
, round_date()
, and ceiling_date()
. Each function takes a vector of dates to adjust and then the name of the unit to round down (floor), round up (ceiling), or round to.
You can also use each accessor function to modify the components of a date/time. This doesn’t come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.
You can also use each accessor function to modify the components of a date/time. This doesn’t come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.
Next you’ll learn about how arithmetic with dates works, including subtraction, addition, and division. Along the way, you’ll learn about three important classes that represent time spans:
How do you pick between duration, periods, and intervals?
As always, pick the simplest data structure that solves your problem.
In R, when you subtract two dates, you get a difftime object:
A difftime
class object records a time span of seconds, minutes, hours, days, or weeks.
This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the duration.
Durations come with a bunch of convenient constructors:
[1] "15s"
[1] "600s (~10 minutes)"
[1] "43200s (~12 hours)" "86400s (~1 days)"
[1] "0s" "86400s (~1 days)" "172800s (~2 days)"
[4] "259200s (~3 days)" "345600s (~4 days)" "432000s (~5 days)"
[1] "1814400s (~3 weeks)"
[1] "31557600s (~1 years)"
Durations always record the time span in seconds.
Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week.
Larger time units are more problematic.
A year uses the “average” number of days in a year, i.e. 365.25.
There’s no way to convert a month to a duration, because there’s just too much variation.
You can add and multiply durations:
You can add and subtract durations to and from days:
However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:
[1] "2026-03-08 01:00:00 EST"
[1] "2026-03-09 02:00:00 EDT"
Why is one day after 1am March 8, 2am March 9? If you look carefully at the date you might also notice that the time zones have changed. March 8 only has 23 hours because it’s when DST starts, so if we add a full days worth of seconds we end up with a different time.
Lubridate provides periods solve this problem.
Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months.
That allows them to work in a more intuitive way:
Like durations, periods can be created with a number of friendly constructor functions.
You can add and multiply periods:
And of course, add them to dates. Compared to durations, periods are more likely to do what you expect:
Time zones are an enormously complicated topic because of their interaction with geopolitical entities. Fortunately we don’t need to dig into all the details as they’re not all important for data analysis, but there are a few challenges we’ll need to tackle head on.