licenseclassifier

A License Classifier

Github stars Tracking Chart

License Classifier

Build status

Introduction

The license classifier is a library and set of tools that can analyze text to
determine what type of license it contains. It searches for license texts in a
file and compares them to an archive of known licenses. These files could be,
e.g., LICENSE files with a single or multiple licenses in it, or source code
files with the license text in a comment.

A "confidence level" is associated with each result indicating how close the
match was. A confidence level of 1.0 indicates an exact match, while a
confidence level of 0.0 indicates that no license was able to match the text.

Adding a new license

Adding a new license is straight-forward:

  1. Create a file in licenses/.

    • The filename should be the name of the license or its abbreviation. If
      the license is an Open Source license, use the appropriate identifier
      specified at https://spdx.org/licenses/.
    • If the license is the "header" version of the license, append the suffix
      ".header" to it. See licenses/README.md for more details.
  2. Add the license name to the list in license_type.go.

  3. Regenerate the licenses.db file by running the license serializer:

    $ license_serializer -output licenseclassifier/licenses
    
  4. Create and run appropriate tests to verify that the license is indeed
    present.

Tools

Identify license

identify_license is a command line tool that can identify the license(s)
within a file.

$ identify_license LICENSE
LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)

License serializer

The license_serializer tool regenerates the licenses.db archive. The archive
contains preprocessed license texts for quicker comparisons against unknown
texts.

$ license_serializer -output licenseclassifier/licenses

This is not an official Google product (experimental or otherwise), it is just
code that happens to be owned by Google.

Main metrics

Overview
Name With Ownergoogle/licenseclassifier
Primary LanguageGo
Program languageGo (Language Count: 1)
Platform
License:Apache License 2.0
所有者活动
Created At2017-04-10 03:45:47
Pushed At2025-02-13 17:59:39
Last Commit At
Release Count11
Last Release Namev2.0.0 (Posted on )
First Release Namev2.0.0-alpha.1 (Posted on )
用户参与
Stargazers Count325
Watchers Count12
Fork Count79
Commits Count246
Has Issues Enabled
Issues Count20
Issue Open Count12
Pull Requests Count34
Pull Requests Open Count6
Pull Requests Close Count8
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private