Lecture 10
2025-04-10
Imagine that you have a variable that records month:
Using a string to record this variable has two problems:
There are only twelve possible months, and there’s nothing saving you from typos:
It doesn’t sort in a useful way:
You can fix both of these problems with a factor.
To create a factor you must start by creating a list of the valid levels:
Now you can create a factor:
And any values not in the level will be silently converted to NA:
This seems risky, so you might want to use forcats::fct()
instead, which will throw a error warning:
Sorting alphabetically is slightly risky because not every computer will sort strings in the same way. So forcats::fct()
orders by first appearance in the original vector:
If you ever need to see the set of valid levels directly, you can do so with levels()
:
You can also create a factor when reading your data with readr with col_factor()
:
Imagine the following plot, what would you like to ameliorate it for
It is hard to read this plot because there’s no overall pattern. We can improve it by reordering the levels of relig
using fct_reorder()
. fct_reorder()
takes three arguments:
.f
, the factor whose levels you want to modify..x
, a numeric vector that you want to use to reorder the levels..fun
, a function that’s used if there are muImagine the following plot, maybe you would like to have “Not applicable” not show up at the top of the graph.
You can use fct_relevel()
. It takes a factor, .f
, and then any number of levels that you want to move to the front of the line.
fct_reorder2(.f, .x, .y)
reorders the factor .f
by the .y
values associated with the largest .x
values.fct_infreq()
to order levels in decreasing frequency.fct_rev()
if you want them in increasing frequency.fct_recode()
allows you to recode, or change, the value of each level.fct_collapse()
is a useful variant of fct_recode()
using a vector of old levels.fct_lump_lowfreq()
is a simple starting point that progressively lumps the smallest groups categories into “Other”, always keeping “Other” as the smallest category.fct_lump_n()
specifies the exact number of groups.