hayabusa

Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data

  • 所有者: hirolovesbeer/hayabusa
  • 平台:
  • 许可证: MIT License
  • 分类:
  • 主题:
  • 喜欢:
    0
      比较:

Github星跟踪图

Hayabusa

Hayabusa: A Simple and Fast Full-Text Search Engine for Massive System Log Data

Concept

  • Pure python implement
  • Parallel SQLite processing engine
  • SQLite3 FTS(Full Text Search)
  • Core-scale architecture

Architecture

  • Design of the directory structure

    • By specifying a search range of time in ”the directory path + yyyy + mm + dd + hh + min.db”, the search program can select the search time systematically.
    /targetdir/yyyy/mm/dd/hh/min.db
    
  • StoreEngine

    • sample code
    import os.path import sqlite3
    db_file = ’test.db’ log_file = ’1m.log’
    
    if not os.path.exists(db_file):
        conn = sqlite3.connect(db_file) conn.execute("CREATE VIRTUAL TABLE SYSLOG USING FTS3(LOGS)");
        conn.close()
    conn = sqlite3.connect(db_file)
    
    with open(log_file) as fh:
        lines = [[line] for line in fh] 
        conn.executemany(’INSERT INTO SYSLOG VALUES ( ? )’, lines) 
        conn.commit()
    
  • SearchEngine

    • sample command
    $ python search_engine.py -h
    usage: search_engine.py [-h] [--time TIME] [--match MATCH] [-c] [-s] [-v]
    
    optional arguments:
      -h, --help     show this help message and exit
      --time TIME    time explain regexp(YYYY/MM/DD/HH/MIN). eg: 2017/04/27/10/*
      --match MATCH  matching keyword. eg: noc or 'noc Login'
      -e             exact match
      -c             count
      -s             sum
      -v             verbose
    
     $ python search_engine.py --time 2017/05/11/13/* --match 'keyword' -c 
    
  • Architecture image
    Hayabusa Architecture

Search condition

  • case-insensitive

    • no distinguish uppercase or lowercase
  • Exact match

    -e --match '192.168.0.1'
    
  • AND

    --match 'Hello World'
    
  • OR

    --match 'Hello OR World'
    
  • NOT

    --match 'Hello World -Wide'
    
  • PHRASE

    --match '"Hello World"'
    --match '\"192.168.0.1\"' <- IP address case(same as -e flag)
    --match '\"192.168.0.1\" src sent' <- PHRASE + AND search
    
  • asterisk(*)

    --match 'H* World'
    
  • HAT

    --match '^Hello World'
    

Development environment

  • CentOS 7.3
  • Python 3.5.1(use anaconda packages)
  • SQLite3(version 3.9.2)

Dependency softwares

  • Python 3
  • SQLite3
  • GNU Parallel

Benchmark

Compare with Apache Spark

  • Hayabusa and Spark time comparison

  • Comarison of distributes Spark environment and the stand-alone Hayabusa

主要指标

概览
名称与所有者hirolovesbeer/hayabusa
主编程语言CSS
编程语言Python (语言数: 6)
平台
许可证MIT License
所有者活动
创建于2017-04-26 04:58:09
推送于2018-10-15 06:18:43
最后一次提交2018-10-15 15:18:35
发布数0
用户参与
星数43
关注者数7
派生数3
提交数42
已启用问题?
问题数1
打开的问题数1
拉请求数0
打开的拉请求数0
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?