tidyr
Overview
The goal of tidyr is to help you create tidy data. Tidy data is data
where:
- Every column is variable.
- Every row is an observation.
- Every cell is a single value.
Tidy data describes a standard way of storing data that is used wherever
possible throughout the tidyverse. If you
ensure that your data is tidy, you’ll spend less time fighting with the
tools and more time working on your analysis. Learn more about tidy data
in vignette("tidy-data")
.
Installation
# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just tidyr:
install.packages("tidyr")
# Or the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tidyr")
Cheatsheet
Getting started
library(tidyr)
tidyr functions fall into five main categories:
-
“Pivotting” which converts between long and wide forms. tidyr 1.0.0
introducespivot_longer()
andpivot_wider()
, replacing the older
spread()
andgather()
functions. Seevignette("pivot")
for
more details. -
“Rectangling”, which turns deeply nested lists (as from JSON) into
tidy tibbles. Seeunnest_longer()
,unnest_wider()
,hoist()
,
andvignette("rectangle")
for more details. -
Nesting converts grouped data to a form where each group becomes a
single row containing a nested data frame, and unnesting does the
opposite. Seenest()
,unnest()
, andvignette("nest")
for more
details. -
Splitting and combining character columns. Use
separate()
and
extract()
to pull a single character column into multiple columns;
useunite()
to combine multiple columns into a single character
column. -
Make implicit missing values explicit with
complete()
; make
explicit missing values implicit withdrop_na()
; replace missing
values with next/previous value withfill()
, or a known value with
replace_na()
.
Related work
tidyr replaces reshape2 (2010-2014) and reshape (2005-2010). Somewhat
counterintuitively, each iteration of the package has done less. tidyr
is designed specifically for tidying data, not general reshaping
(reshape2), or the general aggregation (reshape).
data.table provides high-performance
implementations of melt()
and dcast()
If you’d like to read more about data reshaping from a CS perspective,
I’d recommend the following three papers:
-
Wrangler: Interactive visual specification of data transformation
scripts -
An interactive framework for data
cleaning
(Potter’s wheel) -
On efficiently implementing SchemaSQL on a SQL database
system
To guide your reading, here’s a translation between the terminology used
in different places:, tidyr, gather, spread, ------------, -------, ------, reshape(2), melt, cast, spreadsheets, unpivot, pivot, databases, fold, unfold, ## Getting help
If you encounter a clear bug, please file a minimal reproducible example
on github. For questions
and other discussion, please use
community.rstudio.com.
Please note that the tidyr project is released with a Contributor Code
of Conduct. By
contributing to this project, you agree to abide by its terms.