Kaggle_CrowdFlower

1st Place Solution for Search Results Relevance Competition on Kaggle (https://www.kaggle.com/c/crowdflower-search-relevance)

Github星跟踪图

Kaggle_CrowdFlower

1st Place Solution for Search Results Relevance Competition on Kaggle

The best single model we have obtained during the competition was an XGBoost model with linear booster of Public LB score 0.69322 and Private LB score 0.70768. Our final winning submission was a median ensemble of 35 best Public LB submissions. This submission scored 0.70807 on Public LB and 0.72189 on Private LB.

What's New

FlowChart

Documentation

See ./Doc/Kaggle_CrowdFlower_ChenglongChen.pdf for documentation.

Instruction

  • download data from the competition website and put all the data into folder ./Data.
  • run python ./Code/Feat/run_all.py to generate features. This will take a few hours.
  • run python ./Code/Model/generate_best_single_model.py to generate best single model submission. In our experience, it only takes a few trials to generate model of best performance or similar performance. See the training log in ./Output/Log/[Pre@solution]_[Feat@svd100_and_bow_Jun27]_[Model@reg_xgb_linear]_hyperopt.log for example.
  • run python ./Code/Model/generate_model_library.py to generate model library. This is quite time consuming. But you don't have to wait for this script to finish: you can run the next step once you have some models trained.
  • run python ./Code/Model/generate_ensemble_submission.py to generate submission via ensemble selection.
  • if you don't want to run the code, just submit the file in ./Output/Subm.

主要指标

概览
名称与所有者ChenglongChen/kaggle-CrowdFlower
主编程语言C++
编程语言Python (语言数: 8)
平台
许可证
所有者活动
创建于2015-07-12 06:41:27
推送于2021-09-25 02:32:49
最后一次提交2021-09-25 10:32:49
发布数0
用户参与
星数1.8k
关注者数101
派生数657
提交数18
已启用问题?
问题数4
打开的问题数1
拉请求数1
打开的拉请求数0
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?