为推文优化的Lucene Analyzer类
Posted ljbguanli
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了为推文优化的Lucene Analyzer类相关的知识,希望对你有一定的参考价值。
<strong><span style="font-size:18px;">/*** * @author YangXin * @info 使用Doublemetaphone函数对Twitter优化。 * Doublemetaphone函数能够为发音类似的单词创建同样的键 * */ package unitTwelve; import java.io.IOException; import org.apache.commons.codec.language.DoubleMetaphone; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.StopFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.en.PorterStemFilter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.tokenattributes.TermAttribute; import org.apache.lucene.util.Version; public class TwitterAnalyzer extends Analyzer{ private DoubleMetaphone filter = new DoubleMetaphone(); public TokenStream result = new PorterStemFilter(new StopFilter(true, new StandardTokenizer(Version.LUCENE_CURRENT, reader), StandardAnalyzer.STOP_WORDS_SET)); TermAttribute termAtt = (TermAttribute) result.addAttribute(TermAttribute.class); StringBuilder buf = new StringBuilder(); try{ while(result.incrementToken()){ String word = new String(termAtt.term(), 0, termAtt.termLength()); buf.append(filter.encode(filter.encode(word)).append(" ")); } }catch(IOException e){ e.printStackTrace(); } return new WhitespaceTokenizer(new StringReader(buf.toString())); } }</span></strong>
以上是关于为推文优化的Lucene Analyzer类的主要内容,如果未能解决你的问题,请参考以下文章
Lucene 7.2.1 自定义Analyzer和TokenFilter