kaggle-web-traffic

1st place solution

Github星跟踪图

Kaggle Web Traffic Time Series Forecasting

1st place solution

predictions

Main files:

  • make_features.py - builds features from source data
  • input_pipe.py - TF data preprocessing pipeline (assembles features
    into training/evaluation tensors, performs some sampling and normalisation)
  • model.py - the model
  • trainer.py - trains the model(s)
  • hparams.py - hyperpatameter sets.
  • submission-final.ipynb - generates predictions for submission

How to reproduce competition results:

  1. Download input files from https://www.kaggle.com/c/web-traffic-time-series-forecasting/data :
    key_2.csv.zip, train_2.csv.zip, put them into data directory.
  2. Run python make_features.py data/vars --add_days=63. It will
    extract data and features from the input files and put them into
    data/vars as Tensorflow checkpoint.
  3. Run trainer:
    python trainer.py --name s32 --hparam_set=s32 --n_models=3 --name s32 --no_eval --no_forward_split --asgd_decay=0.99 --max_steps=11500 --save_from_step=10500. This command
    will simultaneously train 3 models on different seeds (on a single TF graph)
    and save 10 checkpoints from step 10500 to step 11500 to data/cpt.
    Note: training requires GPU, because of cuDNN usage. CPU training will not work.
    If you have 3 or more GPUs, add --multi_gpu flag to speed up the training. One can also try different
    hyperparameter sets (described in hparams.py): --hparam_set=definc,
    --hparam_set=inst81, etc.
    Don't be afraid of displayed NaN losses during training. This is normal,
    because we do the training in a blind mode, without any evaluation of model performance.
  4. Run submission-final.ipynb in a standard jupyter notebook environment,
    execute all cells. Prediction will take some time, because it have to
    load and evaluate 30 different model weights. At the end,
    you'll get submission.csv.gz file in data directory.

See also detailed model description

主要指标

概览
名称与所有者Arturus/kaggle-web-traffic
主编程语言Jupyter Notebook
编程语言Python (语言数: 2)
平台
许可证MIT License
所有者活动
创建于2017-11-17 21:15:59
推送于2022-10-09 07:10:28
最后一次提交2018-10-15 18:04:59
发布数0
用户参与
星数1.8k
关注者数74
派生数666
提交数13
已启用问题?
问题数38
打开的问题数11
拉请求数1
打开的拉请求数2
关闭的拉请求数1
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?