TalkingData

TalkingData AdTracking Fraud Detection Challenge

  • 所有者: CuteChibiko/TalkingData
  • 平台:
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

TalkingData AdTracking Fraud Detection Challenge

models and scores

model definition can be found in scripts/model_lib.py

  • model1 LGBM with 83 (76 numerical, 7 categorical) features.

  • model2 keras with 27(18 numerical, 9 categorical) features, You can see network structure in model.png, model, private score, public score, ---, ---, ---, model1, 0.9836325, 0.9828896, model2, 0.9830595, 0.9822785, ## feature engineering and scripts
    Most of these features have already been discussed on the kaggle forum.

  • counting features

    • mk_feat_count.py
    • mk_feat_count_time.py
    • mk_feat_countRatio.py
  • cumulative count

    • mk_feat_cumcount.py
    • mk_feat_recumcount.py
    • mk_feat_cumratio.py
  • time to next click

    • mk_feat_nextClick_leak_day.py
    • mk_feat_nextClick_filter.py
  • time bucket count.(make multiple time intervals, and count the number of buckets which the IP exists)

    • mk_feat_rangecount.py
    • mk_feat_rangecount_minute.py
  • variance

    • mk_feat_var.py
  • common IP

    • mk_feat_common_ip.py
  • unique count

    • mk_feat_uniq_count2.py
  • target encoding: woe

    • mk_feat_woe_all_prev.py
    • mk_feat_woe_bound.py

Features will be calculated once and saved to disk.

Importance from LGBM is found in importance.txt.

Requirements

I used following environment

Hardware:

  • Memory: 256GB RAM, 256GB SWAP
  • CPU: 20 core, 2.10GHz
  • GPU: 1080Ti

Python3 packages:

  • numpy==1.14.2
  • pandas==0.22.0
  • lightgbm==2.1.0
  • keras==2.1.5

How to run

At first, put sample_submission.csv test.csv test_supplement.csv train.csv to input directory.

Then run shell scripts as follows,

$ cd scripts/

$ ./run_mk_feats.sh

$ ./run_mk_model1.sh

$ ./run_mk_model2.sh

Output prediction files will be in csv directory.

It took about one day for feature extraction(run_mk_feats.sh).

It needs large memory(~256GB) to build model1(run_mk_model1.sh), sorry.

GPU is required to build model2(run_mk_model2.sh)

主要指標

概覽
名稱與所有者CuteChibiko/TalkingData
主編程語言Python
編程語言Python (語言數: 2)
平台
許可證Apache License 2.0
所有者活动
創建於2018-05-09 15:17:45
推送於2018-05-11 01:32:27
最后一次提交2018-05-11 10:32:26
發布數0
用户参与
星數104
關注者數1
派生數41
提交數6
已啟用問題?
問題數1
打開的問題數1
拉請求數0
打開的拉請求數0
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?