无法在 Solr 5.0 中显示索引内容
Posted
技术标签:
【中文标题】无法在 Solr 5.0 中显示索引内容【英文标题】:Unable to show the indexed content in Solr 5.0 【发布时间】:2015-03-04 05:41:35 【问题描述】:即使在managedSchemaResourceName
中命名的资源中使用 curl 将以下行添加到架构中,在搜索期间也无法显示内容字段。
<field name="content" stored="true" type="text_general" indexed="true"/>
我正在使用来自 ManagedIndexSchemaFactory
的架构。
由于ExtractRequestHandler
已经默认在solrconfig.xml
中定义,我正在使用ManagedIndexSchemaFactory
。我添加了内容字段行以允许在用户进行查询时显示索引内容,因为默认设置不显示内容。我使用curl
添加如下:
$ curl -X POST -H 'Content-type:application/json' --data-binary '
"update-field" :
"name":"text", "type":"text_general", "stored":true, "indexed":true, "storeOffsetsWithPositions":true
' http://localhost:8983/solr/collection1/schema
我已使用以下命令为文档编制索引:
java -Dc=collection1 -Dauto=true -jar example\exampledocs\post.jar example\exampledcos\solr-word.pdf
.
文档已成功索引,当我从内容中搜索任何单词时,搜索能够返回文档 ID 和其他信息,如主题、作者、日期等。但是,文档的内容是未显示。
这是我从结果中得到的。
如果我没有在fl
参数中请求内容字段,我得到的是这样的:
"responseHeader":
"status": 0,
"QTime": 0,
"params":
"indent": "true",
"q": "*:*",
"_": "1425362114731",
"wt": "json"
,
"response":
"numFound": 2,
"start": 0,
"docs": [
"id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf",
"meta_save_date": [
"2008-11-13T00:00:00Z"
],
"dc_subject": [
"solr, word, pdf"
],
"subject": [
"solr word"
],
"author": [
"Grant Ingersoll"
],
"dcterms_created": [
"2008-11-13T00:00:00Z"
],
"date": [
"2008-11-13T00:00:00Z"
],
"creator": [
"Grant Ingersoll"
],
"creation_date": [
"2008-11-13T00:00:00Z"
],
"title": [
"solr-word"
],
"meta_author": [
"Grant Ingersoll"
],
"stream_content_type": [
"application/pdf"
],
"created": [
"Thu Nov 13 13:35:51 UTC 2008"
],
"stream_size": [
21052
],
"meta_keyword": [
"solr, word, pdf"
],
"cp_subject": [
"solr word"
],
"dc_format": [
"application/pdf; version=1.3"
],
"xmp_creatortool": [
"Microsoft Word"
],
"resourcename": [
"C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word.pdf"
],
"keywords": [
"solr, word, pdf"
],
"last_save_date": [
"2008-11-13T00:00:00Z"
],
"dc_title": [
"solr-word"
],
"dcterms_modified": [
"2008-11-13T00:00:00Z"
],
"meta_creation_date": [
"2008-11-13T00:00:00Z"
],
"dc_creator": [
"Grant Ingersoll"
],
"pdf_pdfversion": [
1.3
],
"last_modified": [
"2008-11-13T00:00:00Z"
],
"aapl_keywords": [
"solr, word, pdf"
],
"x_parsed_by": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pdf.PDFParser"
],
"modified": [
"2008-11-13T00:00:00Z"
],
"xmptpg_npages": [
1
],
"pdf_encrypted": [
false
],
"producer": [
"Mac OS X 10.5.5 Quartz PDFContext"
],
"content_type": [
"application/pdf"
],
"_version_": 1494155334466404300
,
"id": "C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf",
"meta_save_date": [
"2015-02-25T00:00:00Z"
],
"author": [
"GHI"
],
"dcterms_created": [
"2015-02-25T00:00:00Z"
],
"date": [
"2015-02-25T00:00:00Z"
],
"creator": [
"GHI"
],
"creation_date": [
"2015-02-25T00:00:00Z"
],
"title": [
"This is another test of PDF extraction in Solr"
],
"meta_author": [
"GHI"
],
"stream_content_type": [
"application/pdf"
],
"created": [
"Wed Feb 25 08:32:19 UTC 2015"
],
"stream_size": [
10345
],
"dc_format": [
"application/pdf; version=1.4"
],
"xmp_creatortool": [
"PDFCreator Version 1.3.2"
],
"resourcename": [
"C:\\Users\\GHI\\solr-5.0.0\\example\\exampledocs\\solr-word2.pdf"
],
"last_save_date": [
"2015-02-25T00:00:00Z"
],
"dc_title": [
"This is another test of PDF extraction in Solr"
],
"dcterms_modified": [
"2015-02-25T00:00:00Z"
],
"meta_creation_date": [
"2015-02-25T00:00:00Z"
],
"dc_creator": [
"GHI"
],
"pdf_pdfversion": [
1.4
],
"last_modified": [
"2015-02-25T00:00:00Z"
],
"x_parsed_by": [
"org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.pdf.PDFParser"
],
"modified": [
"2015-02-25T00:00:00Z"
],
"xmptpg_npages": [
1
],
"pdf_encrypted": [
false
],
"producer": [
"GPL Ghostscript 9.05"
],
"content_type": [
"application/pdf"
],
"_version_": 1494155342991327200
]
如果我请求 fl
参数中的内容字段,这就是我得到的。
"responseHeader":
"status": 0,
"QTime": 1,
"params":
"fl": "content",
"indent": "true",
"q": "*:*",
"_": "1425362147661",
"wt": "json"
,
"response":
"numFound": 2,
"start": 0,
"docs": [
,
]
如果我进行q=content:[* TO *]&fl=id,content
之类的查询
"responseHeader":
"status":0,
"QTime":5,
"params":
"fl":"id,content",
"q":"content:[* TO *]",
"response":"numFound":0,"start":0,"docs":[]
我可以让它在 Solr 4.10.1 中工作,但它在 Solr 5.0 中不起作用。 Solr 5.0 与以前的 Solr 版本有什么不同,我需要注意什么吗?
【问题讨论】:
【参考方案1】:我只使用过 Solr 版本 5 及更高版本,但希望这会有所帮助:为了使字段可搜索,它必须是“文本”类型。例如,如果您有一组字段,例如:
<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
<field name="keywords" type="text_general" indexed="true" stored="true"/>
<field name="resourcename" type="text_general" indexed="true" stored="true"/>
<field name="url" type="text_general" indexed="true" stored="true"/>
<field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
如果您希望它们可搜索,您必须将相应的副本字段添加到文本中。
<copyField source="title" dest="text"/>
<copyField source="author" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
<copyField source="content" dest="text"/>
<copyField source="content_type" dest="text"/>
<copyField source="resourcename" dest="text"/>
<copyField source="url" dest="text"/>
【讨论】:
以上是关于无法在 Solr 5.0 中显示索引内容的主要内容,如果未能解决你的问题,请参考以下文章
如何使用单个 solr 实例索引和搜索位于同一数据源中的两个不同表或 Solr 模板字段无法正常工作