dplyr

dplyr:数据操作语法。「dplyr: A grammar of data manipulation」

Github stars Tracking Chart

dplyr

概述

dplyr 是一个数据操作的语法,提供了一套一致的动词,帮助你解决最常见的数据操作难题。

  • mutate() 增加新的变量,这些变量是现有变量的函数。
  • select() 根据变量的名称选择变量。
  • filter() 根据变量的值来选择案例。
  • summarise() 将多个值缩减为一个摘要。
  • arrange() 改变行的顺序。

这些都与 group_by() 自然地结合在一起,它允许你 "按组" 执行任何操作。你可以在 vignette("dplyr") 中了解更多关于它们的信息。除了这些单表动词,dplyr 还提供了各种双表动词,你可以在 vignette("two-table") 中了解它们。

如果你是 dplyr 的新手,最好的开始是数据科学 R 中的 数据转换章节

后端

除了 数据帧/tibbles 之外,dplyr 还使其他计算后端的工作变得容易和高效。下面是一个可供选择的后端列表。

  • dtplyr:用于大型内存数据集。将你的 dplyr 代码翻译成高性能的 data.table 代码。
  • dbplyr:用于存储在关系型数据库中的数据,将你的 dplyr 代码翻译成高性能的 data.table 代码。将你的 dplyr 代码翻译成 SQL。
  • sparklyr:用于存储在 Apache Spark 中的大型数据集。

安装

# 获取 dplyr 的最简单方法是安装整个 tidyverse。
install.packages("tidyverse")

# 或者,只安装 dplyr。
install.packages("dplyr")

开发版本

要获得错误修复或使用开发版的功能,你可以从 GitHub 上安装 dplyr 的开发版。

# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

小册子 && 用法

(恕删略。请参见自述文件)

获得帮助

如果你遇到一个明显的 bug,请在 GitHub 上用一个最小的可复制的例子提交一个问题。对于问题和其他讨论,请使用 community.rstudio.commanipulatr 邮件列表

请注意,本项目发布时有一个贡献者行为准则。参与本项目即表示你同意遵守其条款。

Main metrics

Overview
Name With Ownertidyverse/dplyr
Primary LanguageR
Program languageR (Language Count: 2)
PlatformLinux, Mac, Windows
License:Other
所有者活动
Created At2012-10-28 13:39:17
Pushed At2025-04-16 15:51:18
Last Commit At
Release Count58
Last Release Namev1.1.4 (Posted on )
First Release Namev0.1 (Posted on )
用户参与
Stargazers Count4.9k
Watchers Count243
Fork Count2.1k
Commits Count7.8k
Has Issues Enabled
Issues Count5049
Issue Open Count97
Pull Requests Count1627
Pull Requests Open Count17
Pull Requests Close Count377
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private

dplyr

CRAN
status
R build
status
Codecov test
coverage

Overview

dplyr is a grammar of data manipulation, providing a consistent set of
verbs that help you solve the most common data manipulation challenges:

  • mutate() adds new variables that are functions of existing
    variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to
perform any operation “by group”. You can learn more about them in
vignette("dplyr"). As well as these single-table verbs, dplyr also
provides a variety of two-table verbs, which you can learn about in
vignette("two-table").

If you are new to dplyr, the best place to start is the data import
chapter
in R for data science.

Backends

In addition to data frames/tibbles, dplyr makes working with other
computational backends accessible and efficient. Below is a list of
alternative backends:

  • dtplyr: for large, in-memory
    datasets. Translates your dplyr code to high performance
    data.table code.

  • dbplyr: for data stored in a
    relational database. Translates your dplyr code to SQL.

  • sparklyr: for very large datasets
    stored in Apache Spark.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just dplyr:
install.packages("dplyr")

Development version

To get a bug fix, or use a feature from the development version, you can
install dplyr from GitHub.

# install.packages("devtools")
devtools::install_github("tidyverse/dplyr")

Cheatsheet

Usage

library(dplyr)

starwars %>% 
  filter(species == "Droid")
#> # A tibble: 5 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 C-3PO    167    75 <NA>       gold       yellow           112 <NA>   Tatooine 
#> 2 R2-D2     96    32 <NA>       white, bl… red               33 <NA>   Naboo    
#> 3 R5-D4     97    32 <NA>       white, red red               NA <NA>   Tatooine 
#> 4 IG-88    200   140 none       metal      red               15 none   <NA>     
#> 5 BB8       NA    NA none       none       black             NA none   <NA>     
#> # … with 4 more variables: species <chr>, films <list>, vehicles <list>,
#> #   starships <list>

starwars %>% 
  select(name, ends_with("color"))
#> # A tibble: 87 x 4
#>   name           hair_color skin_color  eye_color
#>   <chr>          <chr>      <chr>       <chr>    
#> 1 Luke Skywalker blond      fair        blue     
#> 2 C-3PO          <NA>       gold        yellow   
#> 3 R2-D2          <NA>       white, blue red      
#> 4 Darth Vader    none       white       yellow   
#> 5 Leia Organa    brown      light       brown    
#> # … with 82 more rows

starwars %>% 
  mutate(name, bmi = mass / ((height / 100)  ^ 2)) %>%
  select(name:mass, bmi)
#> # A tibble: 87 x 4
#>   name           height  mass   bmi
#>   <chr>           <int> <dbl> <dbl>
#> 1 Luke Skywalker    172    77  26.0
#> 2 C-3PO             167    75  26.9
#> 3 R2-D2              96    32  34.7
#> 4 Darth Vader       202   136  33.3
#> 5 Leia Organa       150    49  21.8
#> # … with 82 more rows

starwars %>% 
  arrange(desc(mass))
#> # A tibble: 87 x 13
#>   name  height  mass hair_color skin_color eye_color birth_year gender homeworld
#>   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>    
#> 1 Jabb…    175  1358 <NA>       green-tan… orange         600   herma… Nal Hutta
#> 2 Grie…    216   159 none       brown, wh… green, y…       NA   male   Kalee    
#> 3 IG-88    200   140 none       metal      red             15   none   <NA>     
#> 4 Dart…    202   136 none       white      yellow          41.9 male   Tatooine 
#> 5 Tarf…    234   136 brown      brown      blue            NA   male   Kashyyyk 
#> # … with 82 more rows, and 4 more variables: species <chr>, films <list>,
#> #   vehicles <list>, starships <list>

starwars %>%
  group_by(species) %>%
  summarise(
    n = n(),
    mass = mean(mass, na.rm = TRUE)
  ) %>%
  filter(n > 1,
         mass > 50)
#> # A tibble: 8 x 3
#>   species      n  mass
#>   <chr>    <int> <dbl>
#> 1 Droid        5  69.8
#> 2 Gungan       3  74  
#> 3 Human       35  82.8
#> 4 Kaminoan     2  88  
#> 5 Mirialan     2  53.1
#> # … with 3 more rows

Getting help

If you encounter a clear bug, please file a minimal reproducible example
on github. For questions
and other discussion, please use
community.rstudio.com, or the
manipulatr mailing list.


Please note that this project is released with a Contributor Code of
Conduct
. By participating
in this project you agree to abide by its terms.