sru

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Github星跟蹤圖

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

The paper has multiple versions, please check the latest one.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

Requirements

Install requirements via pip install -r requirements.txt. CuPy and pynvrtc needed to support training / testing on GPU.

Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

pip install sru[cuda] additionally installs Cupy and pynvrtc.

pip install sru[cpu] additionally installs ninja

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from torch.autograd import Variable
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = Variable(torch.FloatTensor(20, 32, 128).cuda())

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = 0,        # initial bias of highway gate (<= 0)
    rescale = True,          # whether to use scaling correction
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributors

https://github.com/taolei87/sru/graphs/contributors

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.

主要指標

概覽
名稱與所有者asappresearch/sru
主編程語言Python
編程語言Python (語言數: 5)
平台
許可證MIT License
所有者活动
創建於2017-08-28 20:37:41
推送於2022-01-04 21:17:53
最后一次提交2021-05-19 11:52:48
發布數36
最新版本名稱v2.7.0-rc1 (發布於 )
第一版名稱2.0.0 (發布於 )
用户参与
星數2.1k
關注者數63
派生數303
提交數400
已啟用問題?
問題數134
打開的問題數65
拉請求數63
打開的拉請求數3
關閉的拉請求數12
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?