node-word2vec

Word2vec Model Reader for Node.js Client

Github星跟踪图

node-word2vec

Word2vec model reader for Node.js Client.

Welcome

npm install node-word2vec-reader
var Word2vec = require("node-word2vec-reader");
var word2vec = new Word2vec();

为了保证性能,使用 Node.js C ++ Addon 模块管理词表和加载模型。

APIs

所有接口都返回 Promise

word2vec#init(model_file_path)

word2vec.init(model_file_path);

实例化后的 word2vec 先进行初始化,model_file_path是通过word2vec训练后得到的模型。

word2vec#getVocabSize()

获得词表的大小

word2vec.getVocabSize()
    .then(function(num){
        // do your magic
        })

word2vec#getEmbeddingDim()

获得词向量的维度

word2vec.getEmbeddingDim()
    .then(function(num){
        // do your magic
        })

word2vec#v(word)

获得一个词语的向量

word2vec.v("飞机")
    .then(function(vector){
        // do your magic
        })

word2vec#nearby(word, [topK])

获得一个词语最近的 k 个词语及分数。

word2vec.nearby("飞机", 10)
    .then(function(data){
        // do your magic
        })
  • 返回值 JSONArray

[[words], [scores]],包含两个列表,第一个是词语,第二个是对应位置词语的距离分数,同样是在[0~1]区间,越接近于 1 越相似。

比如:

[
    ["股市","股价","股票市场","股灾","楼市","股票","香港股市","行情","恒指","金融市场"],
    [1,0.786284,0.784575,0.751607,0.712255,0.712179,0.710806,0.694434,0.67501,0.666439]
]

word2vec#bow(words)

对传入的词语的列表返回 BoW 向量。

word2vec.bow(["飞机", "航母"])
    .then(function(vector){
        // do your magic
        })

更多详情

Contribute

admin/rebuilt.sh # 重新编译C++ Addon
admin/test.sh # 单元测试

Word2vec

word2vec 是用来训练词向量模型的工具,为了方便,将 word2vec 也放在代码库中。编译和使用 word2vec:

cd app/google
make clean
make
./word2vec

关于 Word2vec 更多信息,参考 word2vec/google/README

Give credits to

ofxMSAWord2Vec

LICENSE

Word2vec model reader for Node.js Client.
Copyright (C) 2018 Hai Liang Wanghailiang.hl.wang@gmail.com

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.

chatoper banner

主要指标

概览
名称与所有者chatopera/node-word2vec
主编程语言C
编程语言Shell (语言数: 6)
平台
许可证Other
所有者活动
创建于2018-02-16 11:04:57
推送于2019-05-08 12:21:49
最后一次提交2019-05-08 07:17:08
发布数0
用户参与
星数15
关注者数1
派生数4
提交数13
已启用问题?
问题数4
打开的问题数1
拉请求数1
打开的拉请求数0
关闭的拉请求数0
项目设置
已启用Wiki?
已存档?
是复刻?
已锁定?
是镜像?
是私有?