[PYTHON-TSNE]可视化Word Vector
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[PYTHON-TSNE]可视化Word Vector相关的知识,希望对你有一定的参考价值。
需要的几个文件:
1.wordList.txt,即你要转化成vector的word list:
spring
maven
junit
ant
swing
xml
jre
jdk
jbutton
jpanel
swt
japplet
jdialog
jcheckbox
jlabel
jmenu
slf4j
test
unit
2.label.txt, 即图中显示的label,可以与wordlist.txt中的word不同。
spring
maven
junit
ant
swing
xml
jre
jdk
jbutton
jpanel
swt
japplet
jdialog
jcheckbox
jlabel
jmenu
slf4j
test
unit
3.model,用gensim生成的word2vec model;
4.运行buildWordVectorFromW2V.py,用于生成wordvectorlist:
from gensim.models.word2vec import Word2Vec from pathutil import get_base_path modelpath = ‘XXX/model‘ model = Word2Vec.load(modelpath) sentenceFilePath = ‘wordList.txt‘ vectorFilePath = ‘word2vec.txt‘ sentence = [] writeStr = ‘‘ with open(sentenceFilePath, ‘r‘) as f: for line in f: sentWordList = line.strip().split(‘ ‘) for word in sentWordList: if word not in model: print ‘error!‘ vec = model[word] for vecTmp in vec: writeStr += (str(vecTmp) + ‘ ‘) writeStr += ‘\\n‘ f = open(vectorFilePath, "w") f.write(writeStr.strip())
5.运行visualization.py,用于生成图片:
import numpy as np from gensim.models.word2vec import Word2Vec import matplotlib.pyplot as plt from pathutil import get_base_path modelpath = ‘XXX/model‘ model = Word2Vec.load(modelpath) sentenceFilePath = ‘wordlist.txt‘ labelFilePath = ‘wordlist.txt‘ visualizeVecs = [] with open(sentenceFilePath, ‘r‘) as f: for line in f: word = line.strip() vec = model[word.lower()] visualizeVecs.append(vec) visualizeWords = [] with open(labelFilePath, ‘r‘) as f: for line in f: word = line.strip() visualizeWords.append(word.lower()) visualizeVecs = np.array(visualizeVecs).astype(np.float64) # Y = tsne(visualizeVecs, 2, 200, 20.0); # # Plot.scatter(Y[:,0], Y[:,1], 20,labels); # # ChineseFont1 = FontProperties(‘SimHei‘) # for i in xrange(len(visualizeWords)): # # if i<len(visualizeWords)/2: # # color=‘green‘ # # else: # # color=‘red‘ # color = ‘red‘ # plt.text(Y[i, 0], Y[i, 1], visualizeWords[i],bbox=dict(facecolor=color, alpha=0.1)) # plt.xlim((np.min(Y[:, 0]), np.max(Y[:, 0]))) # plt.ylim((np.min(Y[:, 1]), np.max(Y[:, 1]))) # plt.show() # vis_norm = np.sqrt(np.sum(temp**2, axis=1, keepdims=True)) # temp = temp / vis_norm temp = (visualizeVecs - np.mean(visualizeVecs, axis=0)) covariance = 1.0 / visualizeVecs.shape[0] * temp.T.dot(temp) U, S, V = np.linalg.svd(covariance) coord = temp.dot(U[:, 0:2]) for i in xrange(len(visualizeWords)): print i print coord[i, 0] print coord[i, 1] color = ‘red‘ plt.text(coord[i, 0], coord[i, 1], visualizeWords[i], bbox=dict(facecolor=color, alpha=0.1), fontsize=22) # fontproperties = ChineseFont1 plt.xlim((np.min(coord[:, 0]), np.max(coord[:, 0]))) plt.ylim((np.min(coord[:, 1]), np.max(coord[:, 1]))) plt.show()
运行结果:
以上是关于[PYTHON-TSNE]可视化Word Vector的主要内容,如果未能解决你的问题,请参考以下文章
使用Mxnet基于skip-gram模型实现word2vect
Power Bi制作的可视化图表可以导入到word文档里面去吗?
Verilog常用的数据选择语句vect[a +: b]或vect [a -: b]
Verilog常用的数据选择语句vect[a +: b]或vect [a -: b]