无法在 Solr 5.0 中显示索引内容

Posted

技术标签:

【中文标题】无法在 Solr 5.0 中显示索引内容【英文标题】:Unable to show the indexed content in Solr 5.0 【发布时间】:2015-03-04 05:41:35 【问题描述】:

即使在managedSchemaResourceName 中命名的资源中使用 curl 将以下行添加到架构中,在搜索期间也无法显示内容字段。

<field name="content" stored="true" type="text_general" indexed="true"/>

我正在使用来自 ManagedIndexSchemaFactory 的架构。

由于ExtractRequestHandler 已经默认在solrconfig.xml 中定义,我正在使用ManagedIndexSchemaFactory。我添加了内容字段行以允许在用户进行查询时显示索引内容,因为默认设置不显示内容。我使用curl添加如下:

$ curl -X POST -H 'Content-type:application/json' --data-binary '
"update-field" :

 "name":"text", "type":"text_general", "stored":true, "indexed":true, "storeOffsetsWithPositions":true

' http://localhost:8983/solr/collection1/schema

我已使用以下命令为文档编制索引: java -Dc=collection1 -Dauto=true -jar example\exampledocs\post.jar example\exampledcos\solr-word.pdf.

文档已成功索引,当我从内容中搜索任何单词时,搜索能够返回文档 ID 和其他信息,如主题、作者、日期等。但是,文档的内容是未显示。

这是我从结果中得到的。

如果我没有在fl参数中请求内容字段,我得到的是这样的:


  "responseHeader": 
    "status": 0,
    "QTime": 0,
    "params": 
      "indent": "true",
      "q": "*:*",
      "_": "1425362114731",
      "wt": "json"
    
  ,
  "response": 
    "numFound": 2,
    "start": 0,
    "docs": [
      
        "id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf",
        "meta_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_subject": [
          "solr, word, pdf"
        ],
        "subject": [
          "solr word"
        ],
        "author": [
          "Grant Ingersoll"
        ],
        "dcterms_created": [
          "2008-11-13T00:00:00Z"
        ],
        "date": [
          "2008-11-13T00:00:00Z"
        ],
        "creator": [
          "Grant Ingersoll"
        ],
        "creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "title": [
          "solr-word"
        ],
        "meta_author": [
          "Grant Ingersoll"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Thu Nov 13 13:35:51 UTC 2008"
        ],
        "stream_size": [
          21052
        ],
        "meta_keyword": [
          "solr, word, pdf"
        ],
        "cp_subject": [
          "solr word"
        ],
        "dc_format": [
          "application/pdf; version=1.3"
        ],
        "xmp_creatortool": [
          "Microsoft Word"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf"
        ],
        "keywords": [
          "solr, word, pdf"
        ],
        "last_save_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_title": [
          "solr-word"
        ],
        "dcterms_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "meta_creation_date": [
          "2008-11-13T00:00:00Z"
        ],
        "dc_creator": [
          "Grant Ingersoll"
        ],
        "pdf_pdfversion": [
          1.3
        ],
        "last_modified": [
          "2008-11-13T00:00:00Z"
        ],
        "aapl_keywords": [
          "solr, word, pdf"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2008-11-13T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "Mac OS X 10.5.5 Quartz PDFContext"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155334466404300
      ,
      
        "id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf",
        "meta_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "author": [
          "GHI"
        ],
        "dcterms_created": [
          "2015-02-25T00:00:00Z"
        ],
        "date": [
          "2015-02-25T00:00:00Z"
        ],
        "creator": [
          "GHI"
        ],
        "creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "title": [
          "This is another test of PDF extraction in Solr"
        ],
        "meta_author": [
          "GHI"
        ],
        "stream_content_type": [
          "application/pdf"
        ],
        "created": [
          "Wed Feb 25 08:32:19 UTC 2015"
        ],
        "stream_size": [
          10345
        ],
        "dc_format": [
          "application/pdf; version=1.4"
        ],
        "xmp_creatortool": [
          "PDFCreator Version 1.3.2"
        ],
        "resourcename": [
          "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf"
        ],
        "last_save_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_title": [
          "This is another test of PDF extraction in Solr"
        ],
        "dcterms_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "meta_creation_date": [
          "2015-02-25T00:00:00Z"
        ],
        "dc_creator": [
          "GHI"
        ],
        "pdf_pdfversion": [
          1.4
        ],
        "last_modified": [
          "2015-02-25T00:00:00Z"
        ],
        "x_parsed_by": [
          "org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.pdf.PDFParser"
        ],
        "modified": [
          "2015-02-25T00:00:00Z"
        ],
        "xmptpg_npages": [
          1
        ],
        "pdf_encrypted": [
          false
        ],
        "producer": [
          "GPL Ghostscript 9.05"
        ],
        "content_type": [
          "application/pdf"
        ],
        "_version_": 1494155342991327200
      
    ]
  

如果我请求 fl 参数中的内容字段,这就是我得到的。


  "responseHeader": 
    "status": 0,
    "QTime": 1,
    "params": 
      "fl": "content",
      "indent": "true",
      "q": "*:*",
      "_": "1425362147661",
      "wt": "json"
    
  ,
  "response": 
    "numFound": 2,
    "start": 0,
    "docs": [
      ,
      
    ]
  

如果我进行q=content:[* TO *]&fl=id,content 之类的查询


  "responseHeader":
    "status":0,
    "QTime":5,
    "params":
      "fl":"id,content",
      "q":"content:[* TO *]",
  "response":"numFound":0,"start":0,"docs":[]
  

我可以让它在 Solr 4.10.1 中工作,但它在 Solr 5.0 中不起作用。 Solr 5.0 与以前的 Solr 版本有什么不同,我需要注意什么吗?

【问题讨论】:

【参考方案1】:

我只使用过 Solr 版本 5 及更高版本,但希望这会有所帮助:为了使字段可搜索,它必须是“文本”类型。例如,如果您有一组字段,例如:

   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="resourcename" type="text_general" indexed="true" stored="true"/>
   <field name="url" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>

如果您希望它们可搜索,您必须将相应的副本字段添加到文本中。

   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="content_type" dest="text"/>
   <copyField source="resourcename" dest="text"/>
   <copyField source="url" dest="text"/>

【讨论】:

以上是关于无法在 Solr 5.0 中显示索引内容的主要内容,如果未能解决你的问题,请参考以下文章

如何使用单个 solr 实例索引和搜索位于同一数据源中的两个不同表或 Solr 模板字段无法正常工作

solr定时更新索引

无法通过 Internet 访问 solr Web 界面

如何设置/配置solr索引文件的最大大小?

SOLR slave 正在执行完整复制,因为它无法删除未使用的索引目录

solr5.5.0在CenOS上的安装与配置