streamsets安装部署
Posted 小徐xfg
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了streamsets安装部署相关的知识,希望对你有一定的参考价值。
详细步骤请查看:https://github.com/streamsets/datacollector/blob/2.6/BUILD.md
1 平台介绍
大数据采集平台,数据源支持结构化和非结构化数据采集,目标源支持hdfs或hive 等,可视化流程设计界面,定时任务调度。
2 环境配置
Centos 7
l Git 1.9+(git-2.9.4.tar.gz)
l JDK 8 (略)
l Maven 3.3.9+(略)
l Node 0.10.32+1 (node-v8.0.0-linux-x64.tar.gz)
l Npm(node-v8.0.0-linux-x64.tar.gz)
l ideaIC-2017.1.4.tar.gz(开发工具)
操作
Vi ~/.bashrc
设置环境变量
export PATH=/home/tbb/worker/node8/bin:$PATH
export PATH=/home/tbb/worker/maven/bin:$PATH
export PATH=/home/tbb/worker/git/bin:/home/tbb/worker/git/libexec/git-core:$PATH
JAVA_HOME=/home/tbb/worker/jdk8/jdk1.8.0_11
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/tools.jar
export PATH JAVA_HOME CLASSPATH
l bower (npm -g install bower)
l grunt-cli (npm -g install grunt-cli)
3 编译方法
第一步,从git上克隆datacollector-api到linux系统中
git clone http://github.com/streamsets/datacollector-api
第二步,从git上克隆datacollector-plugin-api到linux系统中
git clone http://github.com/streamsets/datacollector-plugin-api
第三步,cd 到datacollector-api 目录,install一下
mvn clean install –DskipTests
第四步,cd 到datacollector-plugin-api目录,install一下
mvn clean install –DskipTests
第五步,克隆datacollector源码到本地,时间较长,耐心等待
git clone http://github.com/streamsets/datacollector
第六步,cd 到 datacollector目录,在终端窗口执行命令进行编译
Ø 开发模式编译,第一次编译需要下载很多依赖包,耐心等待
mvn package -Pdist,ui –DskipTests
编译后在disk下生成文件
Ø 发布模式编译
mvn package -Drelease –DskipTests
编译后在release下生成文件
平台核心包为
平台依赖扩展包
第七步,如果单独编译某个组件,cd到单个组件目录,执行
mvn package -Pdist,ui –DskipTests
4 开发工具
进行idea目录 ,./idea.sh 打开图形界面
导入datacollector
在 idea中编译
5 系统部署
第一步:从release发布的文件中拷贝一个精简版本的 stremset平台到部署位置
第二步:解压文件
第三步:创建4个文件(位置随意)
第四步:将解压后的部署文件,etc目录下所有文件拷贝到第三步中的config中
第五步:修改第三步,config中sdc.properties
http.port=18630 修改端口号
http.realm.file.permission.check=false 修改权限
第六步:修改sdc-env.sh
设置系统文件目录配置信息,根据自己的实际路径进行配置
Vi sdc.sh
export SDC_DATA=/home/tbb/stremsets/devdata-bak/data
export SDC_LOG=/home/tbb/stremsets/devdata-bak/log
export SDC_CONF=/home/tbb/stremsets/devdata-bak/config
export SDC_RESOURCES=/home/tbb/stremsets/devdata-bak/sources
export SDC_HOME=/home/tbb/ streamsets-datacollector-2.7.0.0-SNAPSHOT/streamsets-libs
export STREAMSETS_LIBRARIES_EXTRA_DIR=/home/tbb/streamsets-datacollector-2.7.0.0-SNAPSHOT/streamsets-libs-extras
export SDC_FILE_LIMIT="$SDC_FILE_LIMIT:-1024"
第七步:启动服务
bin/streamset dc &
第八步:浏览器访问
6 平台汉化
l 通用ui汉化
修改国际化文件en.sh
7 组件汉化
l 组件汉化
组件汉化需要修改源码,修改硬编码英文
(1)打开./idea.sh中
(2)汉化后编译
(3)编译后,拷贝出jar文件,替换部署工程中streamset-lib中的jar组件
(4)重启streamset系统后生效
8 组件裁剪
(1)删除工程中streamset-lib中的不需要的jar组件,
(2)打开./idea.sh ,注释掉组件节点配置
(3)编译后,拷贝出jar文件,替换部署工程中streamset-lib中的jar组件
(4)重启streamset系统后生效
9 修改记录
9.1 base-lib
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/coap/CoapClientDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/NullDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/StatsNullDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/ToErrorNullDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/http/HttpClientDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/localfilesystem/Groups.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/localfilesystem/LocalFileSystemDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/mqtt/MqttClientDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/recordstolocalfilesystem/ToErrorLocalFSDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/sdcipc/SdcIpcDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/toerror/ToErrorDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/websocket/WebSocketDTarget.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/emailexecutor/EmailDExecutor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/finishpipeline/PipelineFinisherDExecutor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/shell/ShellDExecutor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/coapserver/CoapServerDPushSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/http/HttpClientConfigBean.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/http/HttpClientDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/httpserver/Groups.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/httpserver/HttpServerDPushSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/logtail/FileTailDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/mqtt/MqttClientDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/opcua/OpcUaClientDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/Groups.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/RemoteDownloadConfigBean.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/RemoteDownloadDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/sdcipc/SdcIpcDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/sdcipcwithbuffer/SdcIpcWithDiskBufferDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/spooldir/SpoolDirDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/tcp/TCPServerDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/udp/UDPDSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/websocketserver/WebSocketServerDPushSource.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/base64/Base64DecodingDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/base64/Base64EncodingDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/dedup/DeDupDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/expression/ExpressionDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldfilter/FieldFilterDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener/FieldFlattenerDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldhasher/FieldHasherDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldmask/FieldMaskDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldmerger/FieldMergerDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldorder/FieldOrderDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldrenamer/FieldRenamerDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldtypeconverter/FieldTypeConverterDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldvaluereplacer/FieldValueReplacerDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/geolocation/GeolocationDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/http/HttpDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/javascript/JavaScriptDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/jsonparser/JsonParserDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/kv/local/LocalLookupDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/listpivot/ListPivotDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/logparser/LogParserDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/selector/SelectorDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/splitter/SplitterDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/xmlflattener/XMLFlatteningDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/xmlparser/XmlParserDProcessor.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipConfig.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipConfigBean.java
basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipDProcessor.java
9.2 common-ui
common-ui/src/main/webapp/common/administration/logs/logs.tpl.html
common-ui/src/main/webapp/login.html
9.3 common-lib
commonlib/src/main/java/com/streamsets/pipeline/stage/origin/lib/BasicConfig.java
commonlib/src/main/java/com/streamsets/pipeline/stage/origin/lib/DataParserFormatConfig.java
9.4 container
container/src/main/java/com/streamsets/datacollector/config/PipelineGroups.java
container/src/main/java/com/streamsets/datacollector/creation/PipelineConfigBean.java
container/src/main/java/com/streamsets/datacollector/creation/RuleDefinitionsConfigBean.java
container/src/main/java/com/streamsets/datacollector/creation/StageConfigBean.java
container/src/main/java/com/streamsets/datacollector/validation/PipelineConfigurationValidator.java
9.5 datacollector-ui
datacollector-ui/src/main/webapp/app/app.js
datacollector-ui/src/main/webapp/app/common/pipelineService.js
datacollector-ui/src/main/webapp/app/home/detail/detail.tpl.html
datacollector-ui/src/main/webapp/app/home/header/header.tpl.html
datacollector-ui/src/main/webapp/app/home/home_grid_view.tpl.html
datacollector-ui/src/main/webapp/app/home/home_header.tpl.html
datacollector-ui/src/main/webapp/app/home/home_list_view.tpl.html
datacollector-ui/src/main/webapp/app/home/preview/preview.tpl.html
datacollector-ui/src/main/webapp/app/home/snapshot/snapshot.tpl.html
datacollector-ui/src/main/webapp/app/home/stageLibrary/stageLibrary.js
datacollector-ui/src/main/webapp/app/home/stageLibrary/stageLibrary.tpl.html
datacollector-ui/src/main/webapp/i18n/en.json
datacollector-ui/src/main/webapp/index.html
9.6 hdfs-protolib
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/Groups.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/HdfsDTarget.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/HdfsTargetConfigBean.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/Groups.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsActionsConfig.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsConnectionConfig.java
hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsMetadataDExecutor.java
9.7 hive-protolib
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/FieldMappingConfig.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/Groups.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HMSTargetConfigBean.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HiveDTarget.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HiveMetastoreDTarget.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/Groups.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/HiveQueryDExecutor.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/HiveQueryExecutorConfig.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/lib/hive/HiveConfigBean.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/DecimalDefaultsConfig.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/Errors.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/Groups.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/HiveMetadataDProcessor.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/HiveMetadataOutputStreams.java
hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/PartitionConfig.java
9.8 jdbc-lib
jdbc-lib/src/main/java/com/streamsets/pipeline/lib/jdbc/HikariPoolConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/destination/jdbc/Groups.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/destination/jdbc/JdbcDTarget.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/executor/jdbc/JdbcQueryDExecutor.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/CommonSourceConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/CDCSourceConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/ChangeTypeValues.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/Groups.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/OracleCDCConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/OracleCDCDSource.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/StartValues.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/BatchTableStrategy.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/Groups.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcConfigBean.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcDSource.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcRunnable.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableOrderStrategy.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/processor/jdbclookup/JdbcLookupDProcessor.java
jdbc-lib/src/main/java/com/streamsets/pipeline/stage/processor/jdbctee/JdbcTeeDProcessor.java
10 使用案例
(1) (1) 拷贝配置文件到部署服务器(ambari平台下载HDFS HIVE(HCAT HIVE全下载)),解压找到core-site.xml、hdfs-site.xml和hive-site.xml。
在linux(安装streamsets节点)上 streamsets下新建config,将上面3个xml文件放在新建文件夹下
(2)设置host
· 192.168.0.202 master.novalocalmaster
· 192.168.0.206 slave.novalocalslave
(1) Oracle 采集数据到 hive
Jdbc连接字符:jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:databasename
0 是uuid 自身的具体值
用户:oracle用户
表名匹配模式:可以使用表全名或者表前几个字符+%匹配。如果是使用全名,则数据采集只能采集一个表,如果使用%匹配,则可以匹配全部由前几个字符开头的所有表。如上图%代表匹配所有TEST开头的表。
驱动:oracle.jdbc.driver.OracleDriver
JDBC URL:ambar:8080登录界面,点击hive,复制HiveServer2 JDBC URL
JDBC 驱动名称:org.apache.hive.jdbc.HiveDriver
Hadoop 配置目录:将上面配置的目录拷贝下来
数据库表达式:填写数据库名称
表名:可以使用随机名称$sdc:id,也可以使用原表名$record:attribute(‘jdbc.tables’)
分区:将oracle数据传输到hive指定分区,分区名可以根据需要填写,分区类型有STRING,INT,BIGINT三种方式可以选择,分区表达值根据需要填写
方法一,$record:value('/Age') < 21 ? 0 : 0
$record:value('/Age') < 21 ? 1 : 1
....依次将所需分区建好后执行truncate table tablename,导入数据即可
方法二,
$record:value('/st')根据具体字段来设置、
如报错可能是因为字段设置错误,oracle字段是大写,hive字段是小写
对应的hadoop存储位置填写相同的字段
数据格式:建议avro序列化文件
Hadoop文件系统URL:hdfs地址,如hdfs:xxx.xxx.xxx.xxx:8020
Hadoop FS配置:dfs.client.use.datanode.hostname
是否客户端应该使用DN的HostName,在连接DN时,默认是使用IP。
文件类型:可根据输入类型选择对应类型,如输入类型是text,输出也可以选择text.
文件前缀:可根据具体业务选择
文件后缀:同上,如图片可以添加.jpg等后缀
目录模板:目录保存位置
时区:可以根据具体需求选择相应时区
时间基准:可以根据具体需求选择相应时间基准
文件最大记录数:选择0是任务停止后提交事务,选择具体值x则每x调数据提交事务。
数据格式:根据输入选择相应的输出格式
JDBC URI:同hive处理器配置
(2) ftp采集数据到hdfs
资源URL:填写对应的服务器IP,如SFTP://xxx.xxx.xxx.xxx:22/tmp/xxx
文件名格式:填写*则抽取目录下所有文件,填写具体名称则抽取具体名称文件
最大批处理大小:可以根据实际服务器IO来配置
批处理等待时间:根据实际业务待定
数据格式:根据输入文件格式确定
压缩格式:根据需求待定
最大行长度:根据文件确定
使用自定义分隔符:根据需求待定,不建议使用
字符:根据文件确定是UTF-8还是GBK活着别的编码
同oracle-hive中hadoop文件系统目标源配置
输入格式:可以根据需求将输入文件选择转码
图片 - local
配置同前面相同,需要注意的是图片、音频、视频的格式Whole File(整个文件),同时输出目录要使用$record:value('/fileInfo/filename') 或/tmp/out/$record:value('file:fileName()')
,否则报错,
以上是关于streamsets安装部署的主要内容,如果未能解决你的问题,请参考以下文章
StreamSets学习系列之StreamSets的集群安装(图文详解)