word2vec

Tools for computing distributed representtion of words

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:

desired vector dimensionality
the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
training algorithm: hierarchical softmax and / or negative sampling
threshold for downsampling the frequent words
number of threads to use
the format of the output word vector file (text or binary)

Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/

名稱與所有者	svn2github/word2vec
主編程語言	C
編程語言	C (語言數: 3)
平台
許可證	Apache License 2.0

名稱與所有者

svn2github/word2vec

主編程語言

編程語言

C (語言數: 3)

平台

許可證

Apache License 2.0

創建於	2014-03-18 13:52:58
推送於	2015-01-30 20:48:45
最后一次提交	2015-01-30 19:30:30
發布數	0

創建於

2014-03-18 13:52:58

推送於

2015-01-30 20:48:45

最后一次提交

2015-01-30 19:30:30

發布數

星數	336
關注者數	24
派生數	221
提交數	42
已啟用問題?
問題數	3
打開的問題數	3
拉請求數	0
打開的拉請求數	0
關閉的拉請求數	0

星數

336

關注者數

派生數

221

提交數

已啟用問題?

問題數

打開的問題數

拉請求數

打開的拉請求數

關閉的拉請求數

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

已啟用Wiki?

已存檔?

是復刻?

已鎖定?

是鏡像?

是私有?

Github星跟蹤圖

Tools for computing distributed representtion of words

主要指標