word2vec

This is a clone of an SVN repository at http://word2vec.googlecode.com/svn/trunk. It had been cloned by http://svn2github.com/ , but the service was since closed. Please read a closing note on my blog post: http://piotr.gabryjeluk.pl/blog:closing-svn2github . If you want to continue synchronizing this repo, look at https://github.com/gabrys/svn2github

  • 所有者: svn2github/word2vec
  • 平台:
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Tools for computing distributed representtion of words

We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.

Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:

  • desired vector dimensionality
  • the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
  • training algorithm: hierarchical softmax and / or negative sampling
  • threshold for downsampling the frequent words
  • number of threads to use
  • the format of the output word vector file (text or binary)

Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets.

The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
is finished, the user can interactively explore the similarity of the words.

More information about the scripts is provided at https://code.google.com/p/word2vec/

主要指標

概覽
名稱與所有者svn2github/word2vec
主編程語言C
編程語言C (語言數: 3)
平台
許可證Apache License 2.0
所有者活动
創建於2014-03-18 13:52:58
推送於2015-01-30 20:48:45
最后一次提交2015-01-30 19:30:30
發布數0
用户参与
星數336
關注者數24
派生數221
提交數42
已啟用問題?
問題數3
打開的問題數3
拉請求數0
打開的拉請求數0
關閉的拉請求數0
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?