如何使用 Tranquility 核心 API 向 druid 发送数据?
Posted
技术标签:
【中文标题】如何使用 Tranquility 核心 API 向 druid 发送数据?【英文标题】:how to send data to druid using Tranquility core API? 【发布时间】:2018-10-31 12:25:01 【问题描述】:我已经设置了 druid 并且能够在:Tutorial: Loading a file 运行教程。我还能够执行本机 json 查询并获得如下所述的结果:http://druid.io/docs/latest/tutorials/tutorial-query.html druid 设置工作正常。
我现在想从 Java 程序中提取额外的数据到这个数据源中。对于使用批量加载创建的数据源,是否可以使用 java 程序中的宁静将数据发送到 druid?
我在https://github.com/druid-io/tranquility/blob/master/core/src/test/java/com/metamx/tranquility/example/JavaExample.java尝试了示例程序
但是这个程序只是继续运行并且没有显示任何输出。如何设置 druid 以使用宁静核心 API 接受数据?
以下是宁静的摄取规范和配置文件:
wikipedia-index.json
"type" : "index",
"spec" :
"dataSchema" :
"dataSource" : "wikipedia",
"parser" :
"type" : "string",
"parseSpec" :
"format" : "json",
"dimensionsSpec" :
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
"name": "added", "type": "long" ,
"name": "deleted", "type": "long" ,
"name": "delta", "type": "long"
]
,
"timestampSpec":
"column": "time",
"format": "iso"
,
"metricsSpec" : [],
"granularitySpec" :
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"],
"rollup" : false
,
"ioConfig" :
"type" : "index",
"firehose" :
"type" : "local",
"baseDir" : "quickstart/",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
,
"appendToExisting" : false
,
"tuningConfig" :
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
example.json(宁静配置):
"dataSources" : [
"spec" :
"dataSchema" :
"dataSource" : "wikipedia",
"metricsSpec" : [
"type" : "count", "name" : "count"
],
"granularitySpec" :
"segmentGranularity" : "hour",
"queryGranularity" : "none",
"type" : "uniform"
,
"parser" :
"type" : "string",
"parseSpec" :
"format" : "json",
"timestampSpec" : "column": "time", "format": "iso" ,
"dimensionsSpec" :
"dimensions" : ["channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user",
"name": "added", "type": "long" ,
"name": "deleted", "type": "long" ,
"name": "delta", "type": "long" ]
,
"tuningConfig" :
"type" : "realtime",
"windowPeriod" : "PT10M",
"intermediatePersistPeriod" : "PT10M",
"maxRowsInMemory" : "100000"
,
"properties" :
"task.partitions" : "1",
"task.replicants" : "1"
],
"properties" :
"zookeeper.connect" : "localhost"
我没有找到任何关于在 druid 上设置数据源的示例,该数据源不断接受来自 java 程序的数据。我不想使用卡夫卡。任何关于此的指针将不胜感激。
【问题讨论】:
你有什么东西吗?我也在看同样的事情。 【参考方案1】:您需要先使用附加数据创建数据文件,然后使用新字段运行摄取任务,您不能在 druid 中编辑相同的记录,它会覆盖到新记录。
【讨论】:
以上是关于如何使用 Tranquility 核心 API 向 druid 发送数据?的主要内容,如果未能解决你的问题,请参考以下文章
kafka + spark Streaming + Tranquility Server发送数据到druid