data.table

R's data.table package extends data.frame:

  • 所有者: Rdatatable/data.table
  • 平台:
  • 許可證: Mozilla Public License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

data.table

CRAN status
Travis build status
AppVeyor build status
Codecov test coverage
GitLab CI build status
downloads
depsy
CRAN usage
BioC usage
indirect usage

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.


30 January 2020
List-columns in data.table - Tyson Barrett, rstudio::conf(2020L)


Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data: ?dcast (pivot/wider/spread) and ?melt (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

install.packages("data.table")

# latest development version:
data.table::update.dev.pkg()

See the Installation wiki for more details.

Usage

Use data.table subset [ operator the same way you would use data.frame one, but...

  • no need to prefix each column with DT$ (like subset() and with() but built-in)
  • any R expression using any package is allowed in j argument, not just list of columns
  • extra argument by to compute j expression by group
library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000

Getting started

Cheatsheets

Community

data.table is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the top most starred R package on GitHub. If you need help, the data.table community is active on StackOverflow.

Stay up-to-date

Contributing

Guidelines for filing issues / pull requests: Contribution Guidelines.

主要指標

概覽
名稱與所有者Rdatatable/data.table
主編程語言R
編程語言R (語言數: 8)
平台
許可證Mozilla Public License 2.0
所有者活动
創建於2014-06-07 16:38:05
推送於2025-07-03 12:52:27
最后一次提交2025-07-03 12:03:10
發布數72
最新版本名稱1.17.6 (發布於 )
第一版名稱1.2 (發布於 )
用户参与
星數3.8k
關注者數173
派生數1k
提交數5.8k
已啟用問題?
問題數4652
打開的問題數953
拉請求數2076
打開的拉請求數100
關閉的拉請求數264
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?