Lucene.Net(3.0.3 或 4.8.0)QueryParser 可以搜索数字吗?

Posted

技术标签:

【中文标题】Lucene.Net(3.0.3 或 4.8.0)QueryParser 可以搜索数字吗?【英文标题】:Can Lucene.Net (3.0.3 or 4.8.0) QueryParser search for numbers? 【发布时间】:2018-01-19 11:30:50 【问题描述】:

我从 Lucene.Net 4.8 演示项目 (https://github.com/synhershko/LuceneNetDemo) 开始。我的目标是能够使用查询解析器(QueryParser 或 MultiFieldQueryParser)来搜索文本和数字。那可能吗?我发现的只是使用范围 (NumericRangeQuery) 的示例,或构建我自己的查询解析器的建议。我无法确定是否可以通过现有的查询解析器创建范围?

using System;
using Lucene.Net.Store;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.Util;
using Lucene.Net.QueryParsers.Classic;
using Lucene.Net.Search;
using Lucene.Net.Analysis.Standard;

/*
Package Manager:
Install-Package Lucene.Net -Version 4.8.0-beta00004 -Pre
Install-Package Lucene.Net.Analysis.Common -Version 4.8.0-beta00004 -Pre
Install-Package Lucene.Net.QueryParser -Version 4.8.0-beta00004 -Pre
*/

namespace LuceneNetNumbers

    class Program
    
        static void Main(string[] args)
        
            LuceneVersion MatchVersion = LuceneVersion.LUCENE_48;

            using (var oDirectory = new RAMDirectory())
            
                var oAnalyzer = new StandardAnalyzer(MatchVersion);
                var oQueryParser = new MultiFieldQueryParser(MatchVersion, new[]  "name", "height", "age" , oAnalyzer);
                var oIndexWriterConfig = new IndexWriterConfig(MatchVersion, oAnalyzer);
                var oIndexWriter = new IndexWriter(oDirectory, oIndexWriterConfig);
                var oSearcherManager = new SearcherManager(oIndexWriter, true, null);

                var oAdd = new Action<string, double, int>((sName, nAge, nHeight) =>
                
                    var oDocument = new Document
                    
                        new TextField("name", sName, Field.Store.YES),
                        new Int32Field("height", nHeight, Field.Store.YES),
                        new DoubleField("age", nAge, Field.Store.YES),
                    ;

                    oIndexWriter.UpdateDocument(new Term("name", sName), oDocument);
                );

                oAdd("John Doe", 24.45, 56);
                oAdd("John Smith", 44.44, 64);
                oAdd("Mike Smith", 56.65, 70);

                oIndexWriter.Flush(true, true);
                oIndexWriter.Commit();

                //

                var oSearch = new Action<string>((sQueryString) =>
                
                    var oQuery = oQueryParser.Parse(sQueryString);
                    oSearcherManager.MaybeRefreshBlocking();
                    var oSearcher = oSearcherManager.Acquire();

                    try
                    
                        var oTopDocs = oSearcher.Search(oQuery, 10);
                        var nTotalHits = oTopDocs.TotalHits;
                        Console.WriteLine("Total Hits: 0", nTotalHits);

                        foreach (var oResult in oTopDocs.ScoreDocs)
                        
                            var oDocument = oSearcher.Doc(oResult.Doc);

                            var nScore = oResult.Score;
                            var sName = oDocument.GetField("name")?.GetStringValue();
                            var nAge = oDocument.GetField("age")?.GetNumericValue();
                            var nHeight = oDocument.GetField("height")?.GetNumericValue();

                            Console.WriteLine("0:0.00, 1,15, 2,8, 3,8", nScore, sName, nAge, nHeight);
                        
                    
                    catch (Exception e)
                    
                        Console.WriteLine(e.ToString());
                    
                    finally
                    
                        oSearcherManager.Release(oSearcher);
                        oSearcher = null;
                    
                );

                oSearch("john");
                oSearch("height:64");

                /*
                Output:
                Total Hits: 2
                0.20,        John Doe,    24.45,       56
                0.20,      John Smith,    44.44,       64
                Total Hits: 0
                */
            
        
    

【问题讨论】:

如果你只搜索“64”,我希望它会给你一击作为回报。 不幸的是,它返回零命中。我尝试了很多变体(oSearch("64")、oSearch("'64'")、oSearch("\"64\""))),但查询解析器似乎不适用于数字。 【参考方案1】:

这与 Lucene.Net 保存数值(编码形式)的方式有关:

new Int32Field("height", nHeight, Field.Store.YES)

您可以使用NumericRangeQuery

var oQuery = NumericRangeQuery.NewInt32Range("height", 64, 64, true, true);

使用您尝试搜索的号码作为minmax 值。

另一种选择是使用TermQuery 并将您的号码转换为BytesRef

BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT32);
NumericUtils.Int32ToPrefixCoded(64, 0, bytes);
Term term = new Term("height", bytes);
var oQuery = new TermQuery(term);

当然,您无法将查询解析为字符串,但您始终可以创建自己的解析器来组合术语:

BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT32);
NumericUtils.Int32ToPrefixCoded(64, 0, bytes);
Term term = new Term("height", bytes);
// var oQuery = new TermQuery(term);

var oQuery = new BooleanQuery

     new TermQuery(new Term("name", "John")), Occur.SHOULD ,
     new TermQuery(term), Occur.SHOULD 
;

可以看到查询是如何翻译成字符串的

【讨论】:

感谢您的回答。虽然它是如何搜索数字的一个很好的例子,但它不使用查询解析器。【参考方案2】:

虽然我对我的想法并不完全满意,但它确实满足了我使用查询解析器搜索数字...和原始示例所暗示的文本的要求。我将继续研究如何将 StandardQueryParser.SetMultiFields 和 StandardQueryParser.NumericConfigMap 与字符串和数字一起使用,并在此处编辑/发布任何发现。

using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers.Flexible.Standard;
using Lucene.Net.QueryParsers.Flexible.Standard.Config;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Lucene.Net.Support;
using Lucene.Net.Util;
using System;
using System.Globalization;

/*
Package Manager:
Install-Package Lucene.Net -Version 4.8.0-beta00004 -Pre
Install-Package Lucene.Net.Analysis.Common -Version 4.8.0-beta00004 -Pre
Install-Package Lucene.Net.QueryParser -Version 4.8.0-beta00004 -Pre
*/

namespace LuceneNetNumbers

    class Program
    
        static void Main(string[] args)
        
            LuceneVersion MatchVersion = LuceneVersion.LUCENE_48;

            using (var oDirectory = new RAMDirectory())
            
                var oAnalyzer = new StandardAnalyzer(MatchVersion);

                //##########
                //##########

                //List of changes...

                //1. Remove this.
                //var oQueryParser = new MultiFieldQueryParser(MatchVersion, new[]  "name", "height", "age" , oAnalyzer);

                //2. Add the following 6 lines of code.
                var oQueryParser = new StandardQueryParser(oAnalyzer);

                var oNumericConfigMap = new HashMap<string, NumericConfig>();
                oNumericConfigMap.Put("height", new NumericConfig(8, new NumberFormatIgnoreExceptions(CultureInfo.CurrentCulture), NumericType.INT32));
                oNumericConfigMap.Put("age", new NumericConfig(8, new NumberFormatIgnoreExceptions(CultureInfo.CurrentCulture), NumericType.DOUBLE));
                oQueryParser.NumericConfigMap = oNumericConfigMap;

                oQueryParser.SetMultiFields(new[]  "name", "height", "age" );

                //3. Add null as second parameter to StandardQueryParser.Parse below to utilize StandardQueryParser.SetMultiFields

                //4. Create NumberFormatIgnoreExceptions. I was not able to find another way (yet) to get 
                //StandardQueryParser.SetMultiFields and StandardQueryParser.NumericConfigMap to work with 
                //both text and number fields.  I feel like this is a bit of a hack, but it does satisfiy my
                //requirement of using a query parser to search for numbers (and text... implied by example).

                //##########
                //##########

                var oIndexWriterConfig = new IndexWriterConfig(MatchVersion, oAnalyzer);
                var oIndexWriter = new IndexWriter(oDirectory, oIndexWriterConfig);
                var oSearcherManager = new SearcherManager(oIndexWriter, true, null);

                var oAdd = new Action<string, double, int>((sName, nAge, nHeight) =>
                
                    var oDocument = new Document
                    
                        new TextField("name", sName, Field.Store.YES),
                        new Int32Field("height", nHeight, Field.Store.YES),
                        new DoubleField("age", nAge, Field.Store.YES),
                    ;

                    oIndexWriter.UpdateDocument(new Term("name", sName), oDocument);
                );

                oAdd("John Doe", 24.45, 56);
                oAdd("John Smith", 44.44, 64);
                oAdd("Mike Smith", 56.65, 70);

                oIndexWriter.Flush(true, true);
                oIndexWriter.Commit();

                //

                var oSearch = new Action<string>((sQueryString) =>
                
                    var oQuery = oQueryParser.Parse(sQueryString, null);
                    oSearcherManager.MaybeRefreshBlocking();
                    var oSearcher = oSearcherManager.Acquire();

                    try
                    
                        var oTopDocs = oSearcher.Search(oQuery, 10);
                        var nTotalHits = oTopDocs.TotalHits;
                        Console.WriteLine("Total Hits: 0", nTotalHits);

                        foreach (var oResult in oTopDocs.ScoreDocs)
                        
                            var oDocument = oSearcher.Doc(oResult.Doc);

                            var nScore = oResult.Score;
                            var sName = oDocument.GetField("name")?.GetStringValue();
                            var nAge = oDocument.GetField("age")?.GetNumericValue();
                            var nHeight = oDocument.GetField("height")?.GetNumericValue();

                            Console.WriteLine("0:0.00, 1,15, 2,8, 3,8", nScore, sName, nAge, nHeight);
                        
                    
                    catch (Exception e)
                    
                        Console.WriteLine(e.ToString());
                    
                    finally
                    
                        oSearcherManager.Release(oSearcher);
                        oSearcher = null;
                    
                );

                oSearch("john");
                oSearch("height:64");
                oSearch("age:[44.45 TO 56.66]");
                oSearch("height:[70 TO *]");

                /*
                Output:
                Total Hits: 2
                0.12,        John Doe,    24.45,       56
                0.12,      John Smith,    44.44,       64
                Total Hits: 1
                1.00,      John Smith,    44.44,       64
                Total Hits: 1
                1.00,      Mike Smith,    56.65,       70
                Total Hits: 1
                1.00,      Mike Smith,    56.65,       70
                */
            
        
    

    class NumberFormatIgnoreExceptions : NumberFormat
    
        public NumberFormatIgnoreExceptions(CultureInfo locale) : base(locale)
        
        

        public override object Parse(string source)
        
            var oValue = default(object);

            try  oValue = base.Parse(source);  catch  

            return oValue;
        
    

【讨论】:

以上是关于Lucene.Net(3.0.3 或 4.8.0)QueryParser 可以搜索数字吗?的主要内容,如果未能解决你的问题,请参考以下文章

如何识别文件夹中是不是存在 Lucene.Net 索引?

Lucene.Net

Lucene.Net 最佳实践

jieba.NET与Lucene.Net的集成

框架:Lucene.net

使用Lucene.Net实现全文检索