stringr

A fresh approach to string manipulation in R

Github stars Tracking Chart

stringr

CRAN
status
Travis build
status
AppVeyor Build
Status
Codecov test
coverage
Lifecycle:
stable

Overview

Strings are not glamorous, high-profile components of R, but they do
play a big role in many data cleaning and preparation tasks. The stringr
package provide a cohesive set of functions designed to make working
with strings as easy as possible. If you’re not familiar with strings,
the best place to start is the chapter on
strings
in R for Data Science.

stringr is built on top of
stringi, which uses the
ICU C library to provide fast, correct
implementations of common string manipulations. stringr focusses on the
most important and commonly used string manipulation functions whereas
stringi provides a comprehensive set covering almost anything you can
imagine. If you find that stringr is missing a function that you need,
try looking in stringi. Both packages share similar conventions, so once
you’ve mastered stringr, you should find stringi similarly easy to use.

Installation

# Install the released version from CRAN:
install.packages("stringr")

# Install the cutting edge development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/stringr")

Cheatsheet

Usage

All functions in stringr start with str_ and take a vector of strings
as the first argument.

x <- c("why", "video", "cross", "extra", "deal", "authority")
str_length(x) 
#> [1] 3 5 5 5 4 9
str_c(x, collapse = ", ")
#> [1] "why, video, cross, extra, deal, authority"
str_sub(x, 1, 2)
#> [1] "wh" "vi" "cr" "ex" "de" "au"

Most string functions work with regular expressions, a concise language
for describing patterns of text. For example, the regular expression
"[aeiou]" matches any single character that is a vowel:

str_subset(x, "[aeiou]")
#> [1] "video"     "cross"     "extra"     "deal"      "authority"
str_count(x, "[aeiou]")
#> [1] 0 3 1 2 2 4

There are seven main verbs that work with patterns:

  • str_detect(x, pattern) tells you if there’s any match to the
    pattern.

    str_detect(x, "[aeiou]")
    #> [1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
    
  • str_count(x, pattern) counts the number of patterns.

    str_count(x, "[aeiou]")
    #> [1] 0 3 1 2 2 4
    
  • str_subset(x, pattern) extracts the matching components.

    str_subset(x, "[aeiou]")
    #> [1] "video"     "cross"     "extra"     "deal"      "authority"
    
  • str_locate(x, pattern) gives the position of the match.

    str_locate(x, "[aeiou]")
    #>      start end
    #> [1,]    NA  NA
    #> [2,]     2   2
    #> [3,]     3   3
    #> [4,]     1   1
    #> [5,]     2   2
    #> [6,]     1   1
    
  • str_extract(x, pattern) extracts the text of the match.

    str_extract(x, "[aeiou]")
    #> [1] NA  "i" "o" "e" "e" "a"
    
  • str_match(x, pattern) extracts parts of the match defined by
    parentheses.

    # extract the characters on either side of the vowel
    str_match(x, "(.)[aeiou](.)")
    #>      [,1]  [,2] [,3]
    #> [1,] NA    NA   NA  
    #> [2,] "vid" "v"  "d" 
    #> [3,] "ros" "r"  "s" 
    #> [4,] NA    NA   NA  
    #> [5,] "dea" "d"  "a" 
    #> [6,] "aut" "a"  "t"
    
  • str_replace(x, pattern, replacement) replaces the matches with new
    text.

    str_replace(x, "[aeiou]", "?")
    #> [1] "why"       "v?deo"     "cr?ss"     "?xtra"     "d?al"      "?uthority"
    
  • str_split(x, pattern) splits up a string into multiple pieces.

    str_split(c("a,b", "c,d,e"), ",")
    #> 
    #> [1] "a" "b"
    #> 
    #> 
    #> [1] "c" "d" "e"
    

As well as regular expressions (the default), there are three other
pattern matching engines:

  • fixed(): match exact bytes
  • coll(): match human letters
  • boundary(): match boundaries

RStudio Addin

The RegExplain RStudio
addin
provides a
friendly interface for working with regular expressions and functions
from stringr. This addin allows you to interactively build your regexp,
check the output of common string matching functions, consult the
interactive help pages, or use the included resources to learn regular
expressions.

This addin can easily be installed with devtools:

# install.packages("devtools")
devtools::install_github("gadenbuie/regexplain")

Compared to base R

R provides a solid set of string operations, but because they have grown
organically over time, they can be inconsistent and a little hard to
learn. Additionally, they lag behind the string operations in other
programming languages, so that some things that are easy to do in
languages like Ruby or Python are rather hard to do in R.

  • Uses consistent function and argument names. The first argument is
    always the vector of strings to modify, which makes stringr work
    particularly well in conjunction with the pipe:

    letters %>%
      .[1:10] %>% 
      str_pad(3, "right") %>%
      str_c(letters[2:11])
    #>  [1] "a  b" "b  c" "c  d" "d  e" "e  f" "f  g" "g  h" "h  i" "i  j" "j  k"
    
  • Simplifies string operations by eliminating options that you don’t
    need 95% of the time.

  • Produces outputs than can easily be used as inputs. This includes
    ensuring that missing inputs result in missing outputs, and zero
    length inputs result in zero length outputs.

Main metrics

Overview
Name With Ownertidyverse/stringr
Primary LanguageR
Program languageR (Language Count: 3)
Platform
License:Other
所有者活动
Created At2009-11-08 22:20:08
Pushed At2024-08-20 20:15:14
Last Commit At
Release Count16
Last Release Namev1.5.1 (Posted on )
First Release Namestring-0.1 (Posted on 2009-11-10 21:55:45)
用户参与
Stargazers Count636
Watchers Count36
Fork Count193
Commits Count659
Has Issues Enabled
Issues Count398
Issue Open Count9
Pull Requests Count128
Pull Requests Open Count6
Pull Requests Close Count47
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private