data.table

R's data.table package extends data.frame:

  • Owner: Rdatatable/data.table
  • Platform:
  • License:: Mozilla Public License 2.0
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

data.table

CRAN status
Travis build status
AppVeyor build status
Codecov test coverage
GitLab CI build status
downloads
depsy
CRAN usage
BioC usage
indirect usage

data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.


30 January 2020
List-columns in data.table - Tyson Barrett, rstudio::conf(2020L)


Why data.table?

  • concise syntax: fast to type, fast to read
  • fast speed
  • memory efficient
  • careful API lifecycle management
  • community
  • feature rich

Features

  • fast and friendly delimited file reader: ?fread, see also convenience features for small data
  • fast and feature rich delimited file writer: ?fwrite
  • low-level parallelism: many common operations are internally parallelized to use multiple CPU threads
  • fast and scalable aggregations; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast and feature rich joins: ordered joins (e.g. rolling forwards, backwards, nearest and limited staleness), overlapping range joins (similar to IRanges::findOverlaps), non-equi joins (i.e. joins using operators >, >=, <, <=), aggregate on join (by=.EACHI), update on join
  • fast add/update/delete columns by reference by group using no copies at all
  • fast and feature rich reshaping data: ?dcast (pivot/wider/spread) and ?melt (unpivot/longer/gather)
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend, also columns of type list are supported
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible, dated April 2014, and we continuously test against that version; e.g. v1.11.0 released on 5 May 2018 bumped the dependency up from 5 year old R 3.0.0 to 4 year old R 3.1.0

Installation

install.packages("data.table")

# latest development version:
data.table::update.dev.pkg()

See the Installation wiki for more details.

Usage

Use data.table subset [ operator the same way you would use data.frame one, but...

  • no need to prefix each column with DT$ (like subset() and with() but built-in)
  • any R expression using any package is allowed in j argument, not just list of columns
  • extra argument by to compute j expression by group
library(data.table)
DT = as.data.table(iris)

# FROM[WHERE, SELECT, GROUP BY]
# DT  [i,     j,      by]

DT[Petal.Width > 1.0, mean(Petal.Length), by = Species]
#      Species       V1
#1: versicolor 4.362791
#2:  virginica 5.552000

Getting started

Cheatsheets

Community

data.table is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the top most starred R package on GitHub. If you need help, the data.table community is active on StackOverflow.

Stay up-to-date

Contributing

Guidelines for filing issues / pull requests: Contribution Guidelines.

Main metrics

Overview
Name With OwnerRdatatable/data.table
Primary LanguageR
Program languageR (Language Count: 8)
Platform
License:Mozilla Public License 2.0
所有者活动
Created At2014-06-07 16:38:05
Pushed At2025-07-03 12:52:27
Last Commit At2025-07-03 12:03:10
Release Count72
Last Release Name1.17.6 (Posted on )
First Release Name1.2 (Posted on )
用户参与
Stargazers Count3.8k
Watchers Count173
Fork Count1k
Commits Count5.8k
Has Issues Enabled
Issues Count4652
Issue Open Count953
Pull Requests Count2076
Pull Requests Open Count100
Pull Requests Close Count264
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private