ws4j 中句子级别的语义匹配
Posted
技术标签:
【中文标题】ws4j 中句子级别的语义匹配【英文标题】:Semantic matching in ws4j at sentence level 【发布时间】:2016-07-02 04:08:14 【问题描述】:我目前正在尝试在语义上匹配 ws4j 中的两个句子。我在单词级别实现了这个概念,但在句子级别实现同样的概念时遇到了麻烦,并以矩阵形式获得输出,就像它在在线演示中显示的那样。如何开发代码来做同样的事情?
import java.util.List;
import edu.cmu.lti.ws4j.impl.Lesk;
import edu.cmu.lti.jawjaw.pobj.POS;
import edu.cmu.lti.lexical_db.ILexicalDatabase;
import edu.cmu.lti.lexical_db.NictWordNet;
import edu.cmu.lti.lexical_db.data.Concept;
import edu.cmu.lti.ws4j.Relatedness;
import edu.cmu.lti.ws4j.RelatednessCalculator;
public class WordMatcher1
public static void main(String[] args)
String word1="rifle";
String word2="gun";
ILexicalDatabase db = new NictWordNet();
RelatednessCalculator lesk = new Lesk(db);
List<POS[]> posPairs = lesk.getPOSPairs();
double maxScore = -1D;
for(POS[] posPair: posPairs)
String p1 = null,p2 = null;
List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posPair[0].toString());
List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posPair[1].toString());
for(Concept ss1: synsets1)
for (Concept ss2: synsets2)
p1 = ss1.getPos().toString();
p2 = ss2.getPos().toString();
Relatedness relatedness = lesk.calcRelatednessOfSynset(ss1, ss2);
double score = relatedness.getScore();
if (score > maxScore)
maxScore = score;
if (maxScore == -1D)
maxScore = 0.0;
System.out.println("sim('" + word1 +" "+ p1 +"', '" + p2 +" "+ word2+ "') = " + maxScore);
【问题讨论】:
请提供详细信息,展示您为帮助他人做出贡献所做的努力。 ***.com/users/3639557/user3639557 我附上了我实现的 API。它为单词生成正确的值,但为句子生成垃圾值 【参考方案1】:我遇到了类似的问题,这个例子有效:
import java.util.List;
import edu.cmu.lti.jawjaw.pobj.POS;
import edu.cmu.lti.lexical_db.ILexicalDatabase;
import edu.cmu.lti.lexical_db.NictWordNet;
import edu.cmu.lti.lexical_db.data.Concept;
import edu.cmu.lti.ws4j.Relatedness;
import edu.cmu.lti.ws4j.RelatednessCalculator;
import edu.cmu.lti.ws4j.impl.Lesk;
import edu.cmu.lti.ws4j.util.WS4JConfiguration;
public class LeskSimilarity
public static void main(String[] args)
ILexicalDatabase db = new NictWordNet();
RelatednessCalculator lesk = new Lesk(db);
String word1="rifle";
POS posWord1= POS.n;
String word2= "gun";
POS posWord2= POS.n;
double maxScore = 0;
WS4JConfiguration.getInstance().setMFS(true);
List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posWord1.name());
List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posWord2.name());
for(Concept synset1: synsets1)
for (Concept synset2: synsets2)
Relatedness relatedness = lesk.calcRelatednessOfSynset(synset1, synset2);
double score = relatedness.getScore();
if (score > maxScore)
maxScore = score;
if (maxScore == -1D)
maxScore = 0.0;
System.out.println("Similarity score of " + word1 + " & " + word2 + " : " + maxScore);
【讨论】:
以上是关于ws4j 中句子级别的语义匹配的主要内容,如果未能解决你的问题,请参考以下文章
DSSM:深度语义匹配模型(及其变体CLSMLSTM-DSSM)