non-distributional

Non-distributional linguistic word vector representations.

  • Owner: mfaruqui/non-distributional
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

non-distributional

Manaal Faruqui, manaalfar@gmail.com

This repository contains data released with the paper on non-distributional
word vector representation (Faruqui & Dyer, 2015). We provide here word vectors
that have been constructed using non-distributional information. This lexical
information has been collected from different linguistic lexicons constrcuted
over time in NLP research. For more details please refer to the paper.

Data and Tools

####binary-vectors.txt.gz

This is a word vector file which is very high dimensional and is 99.9% sparse.
It contains binary vectors i.e, every word vector has only 1 or 0 as elements.
Its best to use this file in a compressed mode as it expands to around 41 GB
of text file.

Example vector:-

the 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 ...

####word-feat.txt

Every line of this file contains a word followed by all the features that the
word possesses as collected from the group of lexicons in lexicons/ folder.
This is an un-expanded form of the word vectors in binary-vectors.txt.gz

Example vector:-

untrustworthiness wn_noun.attribute noun,negative

####create-vector.py

This script takes a lexicon and converts it into a binary vector. We have created
binary-vectors.txt.gz using this script from all the files in lexicon/ folder. If
you want to create vectors from FrameNet use the following command:-

python create-vector.py < lexicons/framenet.txt > binary-fn-vectors.txt

We created binary-vectors.txt using the following command:-

python create-vector.py < <(cat lexicons/*) > binary-vectors.txt

####lexicons/

Every file in this directory is a lexicon containing the word and the features that
it possesses.

###Reference

@InProceedings{faruqui:2015:non-dist,
  author    = {Faruqui, Manaal and Dyer, Chris},
  title     = {Non-distributional Word Vector Representations},
  booktitle = {Proceedings of ACL},
  year      = {2015},
}

Main metrics

Overview
Name With Ownermfaruqui/non-distributional
Primary LanguagePython
Program languagePython (Language Count: 1)
Platform
License:MIT License
所有者活动
Created At2015-06-16 22:37:02
Pushed At2017-09-15 22:05:15
Last Commit At2017-09-15 18:05:14
Release Count0
用户参与
Stargazers Count62
Watchers Count6
Fork Count9
Commits Count10
Has Issues Enabled
Issues Count1
Issue Open Count1
Pull Requests Count0
Pull Requests Open Count0
Pull Requests Close Count0
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private