Fast Wavenet: An efficient Wavenet generation implementation

Our implementation speeds up Wavenet generation by eliminating redundant convolution operations. A naive implementation of Wavenet generation is O(2^L), while ours is O(L), where L is the number of layers.

While other Wavenet repos focus on training on a large corpus of data and generating samples, ours mostly describes an efficient generation algorithm (which is super simple), and provides an example implementation.

We note that while not explicitly stated in the Wavenet paper, we talked to the authors, and they are doing something similar.

For more about Wavenet, see DeepMind's blog post.

If you find this work useful and use it on your own research, please cite our paper.

@article{paine2016fast,
  title={Fast Wavenet Generation Algorithm},
  author={Paine, Tom Le and Khorrami, Pooya and Chang, Shiyu and Zhang, Yang and Ramachandran, Prajit and Hasegawa-Johnson, Mark A and Huang, Thomas S},
  journal={arXiv preprint arXiv:1611.09482},
  year={2016}
}

Timing experiments

We generated from a model with 2 blocks of L layers each, using a naive implementation and ours. Results are averaged over 100 repeats. When L is small the naive implementation performs better than expected due to GPU parallelization of the convolution operations. But when L is large, the efficient implementation really shines.

General purpose

While this algorithm works well for auto-regressive models like Wavenet, it would be useful anytime you need to run a causal dilated convolutional neural network fast. For instance: classification/regression on streaming data.

Authors (helped via pair coding)

Acknowledgements (helped via conversation)

Wei Han
Yang Zhang
Yuchen Fan

Algorithm motivation

Here we focus on generation for dilated causal convolution networks (E.g. Wavenet) with filter size 2. Though these ideas generalize to larger filter sizes.

During generation, the computational graph used to compute a single output value can be seen as a binary tree.

However, since this model is being applied repeatedly over time, there is a lot of redundant computation, which we can cache to increase the speed of generating a single sample.

The key insight is this: given certain nodes in the graph, we have all the information we need to compute the current output. We call these nodes the recurrent states in analogy to RNNs. These nodes have already been computed, all we need to do is cache them.

Note that at the next time point, we will need a different subset of recurrent states. As a result we need to cache several recurrent states per layer. The number we need to keep is equal to the dilation of that layer.

Algorithm components

The algorithm has two components:

Generation Model
Convolution Queues

The generation model can be viewed as 1 step of a recurrent neural network. It takes as input: the current observation and several recurrent states, and computes: the output prediction and new recurrent states.

The convolution queues store the new recurrent states which have been computed by the layer below.

Algorithm

Initialize the generation model with the weights from the pre-trained convolutional network.
For each layer, initialize the convolution queue. The max length of the queue equals the dilation of that layer. Fill it with zero recurrent states. (See above).

Repeat:

Pop phase
Push phase

Pop phase: For each convolution queue pop the recurrent state and feed it to the corresponding state of the generation model, then calculate the new hidden states and output.

Push phase: For each new hidden state push it into the convolution queue of the layer above.

名稱與所有者	tomlepaine/fast-wavenet
主編程語言	Python
編程語言	Jupyter Notebook (語言數: 2)
平台
許可證	GNU General Public License v3.0

創建於	2016-09-17 05:57:30
推送於	2017-06-20 22:29:01
最后一次提交	2016-12-06 07:36:12
發布數	0

星數	1.8k
關注者數	98
派生數	307
提交數	19
已啟用問題?
問題數	22
打開的問題數	13
拉請求數	3
打開的拉請求數	3
關閉的拉請求數	0

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

fast-wavenet

Github星跟蹤圖