waveglow

A Flow-based Generative Network for Speech Synthesis

  • Owner: NVIDIA/waveglow
  • Platform:
  • License:: BSD 3-Clause "New" or "Revised" License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

WaveGlow

WaveGlow: a Flow-based Generative Network for Speech Synthesis

Ryan Prenger, Rafael Valle, and Bryan Catanzaro

In our recent paper, we propose WaveGlow: a flow-based network capable of
generating high quality speech from mel-spectrograms. WaveGlow combines insights
from Glow and WaveNet in order to provide fast, efficient and high-quality
audio synthesis, without the need for auto-regression. WaveGlow is implemented
using only a single network, trained using only a single cost function:
maximizing the likelihood of the training data, which makes the training
procedure simple and stable.

Our PyTorch implementation produces audio samples at a rate of 4850
kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio
quality as good as the best publicly available WaveNet implementation.

Visit our website for audio samples.

Setup

  1. Clone our repo and initialize submodule

    git clone https://github.com/NVIDIA/waveglow.git
    cd waveglow
    git submodule init
    git submodule update
    
  2. Install requirements pip3 install -r requirements.txt

  3. Install Apex

Generate audio with our pre-existing model

  1. Download our published model
  2. Download mel-spectrograms
  3. Generate audio python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6

N.b. use convert_model.py to convert your older models to the current model
with fused residual and skip connections.

Train your own model

  1. Download LJ Speech Data. In this example it's in data/

  2. Make a list of the file names to use for training/testing

    ls data/*.wav, tail -n+10 > train_files.txt
    ls data/*.wav, head -n10 > test_files.txt
    
  3. Train your WaveGlow networks

    mkdir checkpoints
    python train.py -c config.json
    

    For multi-GPU training replace train.py with distributed.py. Only tested with single node and NCCL.

    For mixed precision training set "fp16_run": true on config.json.

  4. Make test set mel-spectrograms

    python mel2samp.py -f test_files.txt -o . -c config.json

  5. Do inference with your network

    ls *.pt > mel_files.txt
    python3 inference.py -f mel_files.txt -w checkpoints/waveglow_10000 -o . --is_fp16 -s 0.6
    

Main metrics

Overview
Name With OwnerNVIDIA/waveglow
Primary LanguagePython
Program languagePython (Language Count: 1)
Platform
License:BSD 3-Clause "New" or "Revised" License
所有者活动
Created At2018-11-08 00:41:44
Pushed At2023-10-19 15:19:59
Last Commit At2020-09-02 10:20:31
Release Count0
用户参与
Stargazers Count2.3k
Watchers Count76
Fork Count538
Commits Count67
Has Issues Enabled
Issues Count257
Issue Open Count72
Pull Requests Count11
Pull Requests Open Count7
Pull Requests Close Count6
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private