webspider

IT 职位数据和分析网站,帮助您更好地了解 IT 就业市场的要求和趋势。「A website of IT position data & analysis, helps you to get a better understanding of the requirements and trends of the IT job market」

Github星跟蹤圖

Build Status
codecov
Code Health
License
Python

-- --
Version 1.0.1
WebSite http://119.23.223.90:8000
Source https://github.com/JustForFunnnn/webspider
Keywords Python3, Tornado, Celery, Requests

Introduction

This project crawls job&company data from job-seeking websites, cleans the data, modelizes, converts, and stores it in the database. then use Echarts and Bootstrap to build a front-end page to display the IT job statistics, to show the newest requirements and trends of the IT job market.

Demo

You can input the keyword you are interested in into the search box, such as "Python", then click the search button, and the statistics of this keyword will show.

  • The first chart Years of Working(工作年限要求) is about the experience requirement of the Python, according to the data, the "3 ~ 5 years" is the most frequent requirement, then the following is 1 ~ 3 years (Chart Source Code)

  • The second chart Salary Range(薪水分布) is about the salary of the Python, according to the data, the "11k ~ 20k" is the most frequent salary provided, then the following is 21k ~ 35k (Chart Source Code)

and we also got charts:

Python Charts Example:

Alt text

Quick Start

This tutorial is based on Linux - Ubuntu, for other systems, please find the corresponding command

  • Clone the project
git clone git@github.com:JustForFunnnn/webspider.git
  • Install MySQL, Redis, Python3
# install Redis
apt-get install redis-server

# run Redis in background
nohup redis-server &

# install Python3
apt-get install python3

# install MySQL
apt-get install mysql-server

# start MySQL
sudo service mysql start
  • Config database and table
# create database
CREATE DATABASE `spider` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

We still need to create the tables, copy the table definition SQL from tests/schema.sql and run it in MySQL

  • Build project
# after a successful build, some executable jobs will be generated under the path env/bin 
make
  • Run unit-test
make test
  • Run code style check
make flake8
  • Start web service
env/bin/web
  • Stat crawler
# run task scheduler/dispatcher
env/bin/celery_beat
# run celery worker for job data
env/bin/celery_lg_jobs_data_worker
# run celery worker for job count
env/bin/celery_lg_jobs_count_worker
  • Other jobs
# start crawl job count immediately
env/bin/crawl_lg_jobs_count
# start crawl job data immediately
env/bin/crawl_lg_data
# start celery monitoring
env/bin/celery_flower
  • Clean
# clean the existing build result
make clean

主要指標

概覽
名稱與所有者JustForFunnnn/webspider
主編程語言Python
編程語言Python, Makefile, TSQL (語言數: 2)
平台
許可證MIT License
所有者活动
創建於2017-03-21 11:05:58
推送於2023-08-31 14:13:02
最后一次提交2023-08-31 22:13:02
發布數0
用户参与
星數370
關注者數16
派生數126
提交數2
已啟用問題?
問題數12
打開的問題數4
拉請求數8
打開的拉請求數5
關閉的拉請求數6
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?