anon

A UNIX Command To Anonymise Data

Github星跟踪图

Anon — A UNIX Command To Anonymise Data

Build Status

Go Report Card License
GitHub release

Anon is a tool for taking delimited files and anonymising or transforming columns until the output is useful for applications where sensitive information cannot be exposed.

Installation

Releases of Anon are available as pre-compiled static binaries on the corresponding GitHub release. Simply download the appropriate build for your machine and make sure it's in your PATH (or use it directly).

Usage

anon [--config <path to config file, default is ./config.json>]
     [--output <path to output to, default is STDOUT>]

Anon is designed to take input from STDIN and by default will output the anonymised file to STDOUT:

anon < some_file.csv > some_file_anonymised.csv

Configuration

In order to be useful, Anon needs to be told what you want to do to each column of the CSV. The config is defined as a JSON file (defaults to a file called config.json in the current directory):

{
  "csv": {
    "delimiter": ","
  },
  // Optionally define a number of rows to randomly sample down to.
  // To do it, it will hash (using FNV-1 32 bits) the column with the ID
  // in it and will mod the result by the value specified to decide if the
  // row is included or not -> include = hash(idColumn) % mod == 0
  "sampling": {
    // Number used to mod the hash of the id and determine if the row
    // has to be included in the sample or not
    "mod": 30000
    // Specify in which a column a unique ID exists on which the sampling can
    // be performed. Indices are 0 based, so this would sample on the first
    // column.
    "idColumn": 0
  },
  // An array of actions to take on each column - indices are 0 based, so index
  // 0 in this array corresponds to column 1, and so on.
  //
  // There must be an action for every column in the CSV.
  "actions": [
    {
      // The no-op, leaves the input unchanged.
      "name": "nothing"
    },
    {
      // Takes a UK format postcode (eg. W1W 8BE) and just keeps the outcode
      // (eg. W1W).
      "name": "outcode"
    },
    {
      // Hash (SHA1) the input.
      "name": "hash",
      // Optional salt that will be appened to the input.
      // If not defined, a random salt will be generated
      "salt": "salt"
    },
    {
      // Given a date, just keep the year.
      "name": "year",
      "dateConfig": {
        // Define the format of the input date here.
        "format": "YYYYmmmdd"
      }
    },
    {
      // Summarise a range of values.
      "name": "range",
      "rangeConfig": {
        "ranges": [
          // For example, this will take values between 0 and 100, and convert
          // them to the string "0-100".
          // You can use one of (gt, gte) and (lt, lte) but not both at the
          // same time.
          // You also need to define at least one of (gt, gte, lt, lte).
          {
            "gte": 0,
            "lt": 100,
            "output": "0-100"
          }
        ]
      }
    }
  ]
}

Contributing

Any contribution will be welcome, please refer to our contributing guidelines for more information.

License

This project is licensed under the MIT license.

The icon is by Pixel Perfect from Flaticon, and is licensed under a Creative Commons 3.0 BY license.

主要指标

概览
名称与所有者intenthq/anon
主编程语言Go
编程语言Go (语言数: 1)
平台
许可证MIT License
所有者活动
创建于2018-05-17 10:36:20
推送于2019-04-01 13:17:39
最后一次提交2018-05-25 10:52:36
发布数3
最新版本名称v0.2.0 (发布于 )
第一版名称v0.0.1 (发布于 )
用户参与
星数351
关注者数36
派生数14
提交数26
已启用问题?
问题数9
打开的问题数8
拉请求数15
打开的拉请求数1
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?