在solr中查询具有不同字段的多个集合

Posted 2021-05-04

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了在solr中查询具有不同字段的多个集合相关的知识，希望对你有一定的参考价值。

给出以下（单核）查询：

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json

第一个查询返回“numFound”：40000“第二个查询返回”numFound“：10000”

我尝试将这些放在一起：

   http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json

现在我得到“numFound”：50000“。唯一的问题是”a“有比”b“更多的列。所以多个集合请求只返回a的值。

是否可以使用不同的字段查询多个集合？或者他们必须是一样的吗？我应该如何更改我的第三个网址以获得此结果？

答案

你需要的是 - 我称之为 - 统一核心。该模式本身不具有任何内容，它仅用作一种包装器来统一您希望从两个核心显示的那些字段。你需要在那里

一个schema.xml，它包含您希望在统一结果中包含的所有字段
一个查询处理程序，它为您组合了两个不同的核心

事先从the Solr Wiki page about DistributedSearch获得的重要限制

文档必须具有唯一键，并且必须存储唯一键（schema.xml中存储=“true”）唯一键字段在所有分片中必须是唯一的。如果遇到具有重复唯一键的文档，Solr将尝试返回有效结果，但行为可能是不确定的。

例如，我有shard-1，字段id，title，description和shard-2，字段id，title，abstractText。所以我有这些架构

shard-1的模式

<schema name="shard-1" version="1.5">

  <fields>
    <field name="id"
          type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description"
          type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

shard-2的模式

<schema name="shard-2" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

为了统一这些模式，我创建了第三个模式，我称之为shard-unification，它包含所有四个字段。

<schema name="shard-unification" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

现在我需要使用这个组合模式，所以我在solr-unification核心的solrconfig.xml中创建了一个查询处理程序

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

而已。现在shard-1和shard-2中需要一些索引数据。要查询统一结果，只需使用适当的分片参数查询分片统一。

http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2

这会给你一个像这样的结果

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "title":"title 1",
        "description":"description 1",
        "score":1.0},
      {
        "id":2,
        "title":"title 2",
        "abstractText":"abstract 2",
        "score":1.0}]
  }}

Fetch the origin shard of a document

如果要将原始分片提取到每个文档中，只需在[shard]中指定fl即可。无论是作为查询的参数还是在requesthandler的默认值中，请参阅下文。括号是强制性的，它们也将在最终的响应中。

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score,[shard]</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

Working Sample

如果你想看一个正在运行的例子，请在github和my solrsample project上查看execute the ShardUnificationTest。我现在还包括了碎片。

另一答案

碎片应该在Solr中使用

当索引变得太大而无法放在单个系统上时，或者单个查询执行时间太长时

所以列的数量和名称应始终相同。这在本文档中指定（前面的引用也来自）：http://wiki.apache.org/solr/DistributedSearch

如果你保持查询不变，并使两个分片具有相同的字段，这个shoudl就可以正常工作了。

如果你想了解更多关于分片如何在SolrCould工作的信息，请看看这个docuemtn：http://wiki.apache.org/solr/SolrCloud

以上是关于在solr中查询具有不同字段的多个集合的主要内容，如果未能解决你的问题，请参考以下文章

使用多个字段的 Solr 搜索查询

使用具有不同片段字段的相同中继根查询的多个 react-router-relay 路由

Solr 6.6.2分组查询

Solr 搜索多个字段的查询

在多个集合中搜索 mongoose

在 solr 3.6 中获取与 geohash 字段的距离