data_miner

Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.

  • 所有者: seamusabshere/data_miner
  • 平台:
  • 許可證: MIT License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

data_miner

Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.

Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.

Real-world usage

We use data_miner for data science at Brighter Planet and in production at

The killer combination for us is:

  1. active_record_inline_schema - define table structure
  2. remote_table - download data and parse it
  3. errata - apply corrections in a transparent way
  4. data_miner (this library!) - import data idempotently

Documentation

Check out the extensive documentation.

Quick start

You define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:

class Country < ActiveRecord::Base
  self.primary_key = 'iso_3166_code'

  # the "col" class method is provided by a different library - active_record_inline_schema
  col :iso_3166_code                            # alpha-2 2-letter like GB
  col :iso_3166_numeric_code, :type => :integer # numeric like 826; aka UN M49 code
  col :iso_3166_alpha_3_code                    # 3-letter like GBR
  col :name

  data_miner do
    # auto_upgrade! is provided by active_record_inline_schema
    process :auto_upgrade!

    import("OpenGeoCode.org's Country Codes to Country Names list",
           :url => 'http://opengeocode.org/download/countrynames.txt',
           :format => :delimited,
           :delimiter => '; ',
           :headers => false,
           :skip => 22) do
      key   :iso_3166_code, :field_number => 0
      store :iso_3166_alpha_3_code, :field_number => 1
      store :iso_3166_numeric_code, :field_number => 2
      store :name, :field_number => 5
    end
  end
end

Now you can run:

>> Country.run_data_miner!
=> nil

More advanced usage

The earth library has dozens of real-life examples showing how to download, pull out of a ZIP/TAR/BZ2 archive, parse, correct, and import CSVs, fixed-width files, ODS, XLS, XLSX, even HTML and XML:

And many more - look for the data_miner.rb file that corresponds to each model. Note that you would normally put the data_miner declaration right inside the ActiveRecord model file... it's kept separate in earth so that loading it is optional.

Authors

Wishlist

  • Make the tests real unit tests
  • sql steps shouldn't shell out if binaries are missing

Copyright (c) 2013 Seamus Abshere

主要指標

概覽
名稱與所有者seamusabshere/data_miner
主編程語言Ruby
編程語言Ruby (語言數: 1)
平台
許可證MIT License
所有者活动
創建於2009-08-19 12:46:06
推送於2014-02-27 14:23:08
最后一次提交2014-02-27 12:23:01
發布數124
最新版本名稱v3.0.0 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數306
關注者數14
派生數18
提交數519
已啟用問題?
問題數22
打開的問題數8
拉請求數9
打開的拉請求數0
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?