TalkingData

TalkingData AdTracking Fraud Detection Challenge

  • 所有者: CuteChibiko/TalkingData
  • 平台:
  • 许可证: Apache License 2.0
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

TalkingData AdTracking Fraud Detection Challenge

models and scores

model definition can be found in scripts/model_lib.py

  • model1 LGBM with 83 (76 numerical, 7 categorical) features.

  • model2 keras with 27(18 numerical, 9 categorical) features, You can see network structure in model.png, model, private score, public score, ---, ---, ---, model1, 0.9836325, 0.9828896, model2, 0.9830595, 0.9822785, ## feature engineering and scripts
    Most of these features have already been discussed on the kaggle forum.

  • counting features

    • mk_feat_count.py
    • mk_feat_count_time.py
    • mk_feat_countRatio.py
  • cumulative count

    • mk_feat_cumcount.py
    • mk_feat_recumcount.py
    • mk_feat_cumratio.py
  • time to next click

    • mk_feat_nextClick_leak_day.py
    • mk_feat_nextClick_filter.py
  • time bucket count.(make multiple time intervals, and count the number of buckets which the IP exists)

    • mk_feat_rangecount.py
    • mk_feat_rangecount_minute.py
  • variance

    • mk_feat_var.py
  • common IP

    • mk_feat_common_ip.py
  • unique count

    • mk_feat_uniq_count2.py
  • target encoding: woe

    • mk_feat_woe_all_prev.py
    • mk_feat_woe_bound.py

Features will be calculated once and saved to disk.

Importance from LGBM is found in importance.txt.

Requirements

I used following environment

Hardware:

  • Memory: 256GB RAM, 256GB SWAP
  • CPU: 20 core, 2.10GHz
  • GPU: 1080Ti

Python3 packages:

  • numpy==1.14.2
  • pandas==0.22.0
  • lightgbm==2.1.0
  • keras==2.1.5

How to run

At first, put sample_submission.csv test.csv test_supplement.csv train.csv to input directory.

Then run shell scripts as follows,

$ cd scripts/

$ ./run_mk_feats.sh

$ ./run_mk_model1.sh

$ ./run_mk_model2.sh

Output prediction files will be in csv directory.

It took about one day for feature extraction(run_mk_feats.sh).

It needs large memory(~256GB) to build model1(run_mk_model1.sh), sorry.

GPU is required to build model2(run_mk_model2.sh)

主要指标

概览
名称与所有者CuteChibiko/TalkingData
主编程语言Python
编程语言Python (语言数: 2)
平台
许可证Apache License 2.0
所有者活动
创建于2018-05-09 15:17:45
推送于2018-05-11 01:32:27
最后一次提交2018-05-11 10:32:26
发布数0
用户参与
星数104
关注者数1
派生数41
提交数6
已启用问题?
问题数1
打开的问题数1
拉请求数0
打开的拉请求数0
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?