URLdatasource的dataimporthandler中的SOLR子文档定义
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了URLdatasource的dataimporthandler中的SOLR子文档定义相关的知识,希望对你有一定的参考价值。
SOLR dataimporthandlers为JDBC数据源提供父属和子属性。如何为URLdatasource添加父子关系。我的样本数据集:
<name>ABC</name>
<createdAt>1512016450886</createdAt>
<createdBy>XYZ</createdBy>
<attributes>
<attribute>
<name>access</name>
<value>public</value>
</attribute>
<attribute>
<name>owner</name>
<value>ABC</value>
</attribute>
<attribute>
<name>url</name>
<value>planning</value>
</attribute>
</attributes>
并且需要索引数据输出:
{
"name": "ABC",
"createdAt": "1512016450886",
"createdBy": "XYZ",
"Attributes": [
{
"name": "access",
"value": "public"
},
{
"name": "owner",
"value": "ABC"
},
{
"name": "url",
"value": "planning"
}
]
}
示例数据配置:
<dataConfig>
<dataSource type="URLDataSource"/>
<document>
<entity name="sample"
url="http://host:port/api/sample_api.xml"
processor="XPathEntityProcessor"
forEach="/hash/name">
<field column="id" name="id" xpath="/hash/name"/>
<field column="createdBy" name="createdBy" xpath="/hash/createdBy"/>
<field column="createdAt" name="createdAt" xpath="/hash/createdAt"/>
<field column="attributes" name="attributes" xpath="/hash/attributes"/>
<field column="attributes.name" name="attributes.name" xpath="/hash/attributes/attribute/name"/>
<field column="attributes.value" name="attributes.value" xpath="/hash/attributes/attribute/value"/>
</entity>
</document>
</dataConfig>
回应是:
{“name”:“ABC”,“createdAt”:“1512016450886”,“createdBy”:“XYZ”,“attributes.name”:['access','owner','url'],“attributes.value” :['public','ABC','planning']}
我尝试了这个新的data-config.xml:
<dataConfig>
<script>
<![CDATA[ id = 1;
function f1(row) { row.put('attr.attrId', (id ++).toFixed()); return row; } ]]>
</script>
<dataSource type="URLDataSource"/>
<document>
<entity name="entity"
url="http://abc:9090/api/sample_api.xml"
processor="XPathEntityProcessor"
forEach="/hash/entity/entity">
<field column="id" name="id" xpath="/hash/entity/entity/name"/>
<field column="createdBy" name="createdBy" xpath="/hash/entity/entity/createdBy"/>
<entity name="attributes"
url="http://abc:9090/api/sample_api.xml"
child="true"
processor="XPathEntityProcessor"
forEach="/hash/entity/entity/xyz/xyz" transformer="script:f1">
<field column="attr.attrId" name="attr.attrId"/>
<field column="attr.attrName" name="attr.attrName" xpath="/hash/entity/entity/xyz/xyz/name"/>
<field column="attr.attrValue" name="attr.attrValue" xpath="/hash/entity/entity/xyz/xyz/value"/>
</entity>
</entity>
</document>
</dataConfig>
但我在solr.log中遇到以下错误
[ x:xml_data] o.a.s.h.d.SolrWriter Error creating document : SolrInputDocument(fields: [createdBy=XYZ, id=ABC, _version_=1587094252791267328, _root_=ABC], children: [SolrInputDocument(fields: [attr.attrName=access, attr.attrId=1, attr.attrValue=public, _root_=ABC, _version_=1587094252791267328]), SolrInputDocument(fields: [attr.attrName=access12, attr.attrId=2, attr.attrValue=public12, _root_=ABC, _version_=1587094252791267328])])
org.apache.solr.common.SolrException: [doc=null] missing required field: id
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:265)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:107)
at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:212)
at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:185)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:259)
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:433)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1384)
at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:920)
at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:254)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:526)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
答案
实际上,这很容易做到。对child="true"
有支持
这是更新的data-config.xml的样子:
<dataConfig>
<dataSource type="URLDataSource" encoding="utf-8" />
<document>
<entity name="entity"
url="path:to:xml"
processor="XPathEntityProcessor"
forEach="/hash/entity">
<field column="id" name="id" xpath="/hash/entity/name" />
<field column="createdBy" name="createdBy" xpath="/hash/entity/createdBy" />
<field column="createdAt" name="createdAt" xpath="/hash/entity/createdAt" />
<entity name="attributes"
url="path:to:xml"
child="true" processor="XPathEntityProcessor" forEach="/hash/entity/attributes/attribute">
<field column="name" xpath="/hash/entity/attributes/attribute/name" />
<field column="value" xpath="/hash/entity/attributes/attribute/value" />
</entity>
</entity>
</document>
</dataConfig>
与您的相比做了什么:如果您想创建子文档,您需要使用child="true"
创建嵌套实体。您还需要指定数据路径和相同的处理器。此外,一些xpath不正确。
Api XML应该正确格式化(以前,你没有1个根标签,而是其中几个):
<hash>
<entity>
<name>ABC</name>
<createdAt>1512016450886</createdAt>
<createdBy>XYZ</createdBy>
<attributes>
<attribute>
<name>access</name>
<value>public</value>
</attribute>
<attribute>
<name>owner</name>
<value>ABC</value>
</attribute>
<attribute>
<name>url</name>
<value>planning</value>
</attribute>
</attributes>
</entity>
</hash>
以上是关于URLdatasource的dataimporthandler中的SOLR子文档定义的主要内容,如果未能解决你的问题,请参考以下文章