textvae

Theano code for experiments in the paper "A Hybrid Convolutional Variational Autoencoder for Text Generation."

  • Owner: ssemeniuta/textvae
  • Platform:
  • License::
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

A Hybrid Convolutional Variational Autoencoder for Text Generation.

Theano code for experiments in the paper A Hybrid Convolutional Variational Autoencoder for Text Generation.

Preparation

First, run makedata.sh. This will download the ptb dataset, split, and preprocess it.

PTB Experiments

Files prefixed with ''lm_'' contain experiments on the ptb dataset. We provide scripts for training of non-VAE, baseline LSTM VAE, and our models and a script to greedily sample from a trained model. ''defs'' subfolder contains definitions of grid searches we have used to generate data for figures and tables in the paper. Running one search is done by:

python -u nn/scripts/grid_search.py -grid defs/gridname.json

To train our model on samples 60 characters long with alpha=0.2 run:

python -u lm_vae_lstm.py -alpha 0.2 -sample_size 60

Twitter Experiments

Code for these experiments is in files starting with ''twitter_''. We do not release the dataset we have used to train our model, but provide both a script to train one and a pretrained model. To use the script on custom data, create a file ''data/tweets.txt'' containing one data sample per line. By default, the first 10k samples will be used for validation and everything else for training, but no more than ~1M samples. In addition, it will only use tweets with up to 128 characters. This is done only for convenience when down- and upsampling. Training on tweets with up to 140 characters will require a little bit of care when handling spatial dimension.

License

MIT

Main metrics

Overview
Name With Ownerssemeniuta/textvae
Primary LanguagePython
Program languagePython (Language Count: 2)
Platform
License:
所有者活动
Created At2017-02-07 22:19:35
Pushed At2018-10-05 18:54:16
Last Commit At2017-05-29 09:29:54
Release Count0
用户参与
Stargazers Count205
Watchers Count11
Fork Count44
Commits Count5
Has Issues Enabled
Issues Count4
Issue Open Count2
Pull Requests Count1
Pull Requests Open Count1
Pull Requests Close Count1
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private