sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Github stars Tracking Chart

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

The paper has multiple versions, please check the latest one.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

Requirements

Install requirements via pip install -r requirements.txt. CuPy and pynvrtc needed to support training / testing on GPU.

Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

pip install sru[cuda] additionally installs Cupy and pynvrtc.

pip install sru[cpu] additionally installs ninja

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from torch.autograd import Variable
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = Variable(torch.FloatTensor(20, 32, 128).cuda())

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = 0,        # initial bias of highway gate (<= 0)
    rescale = True,          # whether to use scaling correction
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributors

https://github.com/taolei87/sru/graphs/contributors

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.

Main metrics

Overview
Name With Ownerasappresearch/sru
Primary LanguagePython
Program languagePython (Language Count: 5)
Platform
License:MIT License
所有者活动
Created At2017-08-28 20:37:41
Pushed At2022-01-04 21:17:53
Last Commit At2021-05-19 11:52:48
Release Count36
Last Release Namev2.7.0-rc1 (Posted on )
First Release Name2.0.0 (Posted on )
用户参与
Stargazers Count2.1k
Watchers Count63
Fork Count303
Commits Count400
Has Issues Enabled
Issues Count134
Issue Open Count65
Pull Requests Count63
Pull Requests Open Count3
Pull Requests Close Count12
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private