streamsets安装部署

Posted 小徐xfg

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了streamsets安装部署相关的知识,希望对你有一定的参考价值。

详细步骤请查看:https://github.com/streamsets/datacollector/blob/2.6/BUILD.md







平台介绍

大数据采集平台,数据源支持结构化和非结构化数据采集,目标源支持hdfshive 等,可视化流程设计界面,定时任务调度。

环境配置

  Centos 7

Git 1.9+git-2.9.4.tar.gz

JDK 8 (略

Maven 3.3.9+(略

Node 0.10.32+1 node-v8.0.0-linux-x64.tar.gz

Npmnode-v8.0.0-linux-x64.tar.gz

ideaIC-2017.1.4.tar.gz(开发工具)

操作

    Vi   ~/.bashrc

设置环境变量

export PATH=/home/tbb/worker/node8/bin:$PATH

export PATH=/home/tbb/worker/maven/bin:$PATH

export PATH=/home/tbb/worker/git/bin:/home/tbb/worker/git/libexec/git-core:$PATH

JAVA_HOME=/home/tbb/worker/jdk8/jdk1.8.0_11

PATH=$JAVA_HOME/bin:$PATH

CLASSPATH=$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/tools.jar

export PATH JAVA_HOME CLASSPATH

bower  npm -g install bower

grunt-cli npm -g install grunt-cli

 

编译方法

第一步git上克隆datacollector-apilinux系统中

git clone http://github.com/streamsets/datacollector-api

第二步git上克隆datacollector-plugin-apilinux系统中

 

git clone http://github.com/streamsets/datacollector-plugin-api

第三步cd datacollector-api 目录install一下

mvn clean install –DskipTests

第四步cd datacollector-plugin-api目录install一下

mvn clean install –DskipTests

五步,克隆datacollector源码到本地时间较长,耐心等待

 

git clone http://github.com/streamsets/datacollector

 

第六步,cd 到 datacollector目录,在终端窗口执行命令进行编译

Ø 开发模式编译,第一次编译需要下载很多依赖包,耐心等待

mvn package -Pdist,ui –DskipTests

编译后在disk生成文件

 

 

Ø 发布模式编译

mvn package -Drelease –DskipTests

 

编译后release生成文件

平台核心包

  

平台依赖扩展

 

 

 

第七步如果单独编译某个组件,cd到单个组件目录,执行

mvn package -Pdist,ui –DskipTests

 

4 开发工具

进行idea目录 ./idea.sh 打开图形界面

导入datacollector

 

 

在 idea中编译

 

 

 

 

5 系统部署

第一步:从release发布的文件中拷贝一个精简版本的 stremset平台到部署位置

 

第二步:解压文件

第三步:创建4文件(位置随意)

 

 第四步:将解压后的部署文件,etc目录下所有文件拷贝到第三步中的config

 

第五步:修改第三步,configsdc.properties

        http.port=18630     修改端口号

        http.realm.file.permission.check=false   修改权限

第六步:修改sdc-env.sh

 

设置系统文件目录配置信息,根据自己的实际路径进行配置

Vi  sdc.sh

 

export SDC_DATA=/home/tbb/stremsets/devdata-bak/data

export SDC_LOG=/home/tbb/stremsets/devdata-bak/log

export SDC_CONF=/home/tbb/stremsets/devdata-bak/config

export SDC_RESOURCES=/home/tbb/stremsets/devdata-bak/sources

export SDC_HOME=/home/tbb/ streamsets-datacollector-2.7.0.0-SNAPSHOT/streamsets-libs

export STREAMSETS_LIBRARIES_EXTRA_DIR=/home/tbb/streamsets-datacollector-2.7.0.0-SNAPSHOT/streamsets-libs-extras

 

export SDC_FILE_LIMIT="$SDC_FILE_LIMIT:-1024"

第七步:启动服务

bin/streamset dc &

 

 

第八步:浏览器访问

 

6 平台汉化

l 通用ui汉化

修改国际化文件en.sh

 

 

 

 

 

 

 

7 组件汉化

l 组件汉化

 

组件汉化需要修改源码,修改硬编码英文

1)打开./idea.sh

 

2)汉化后编译

 

3)编译后,拷贝出jar文件,替换部署工程中streamset-lib中的jar组件

 

4)重启streamset系统后生效

8 组件裁剪

1)删除工程中streamset-lib中的不需要的jar组件,

 

2)打开./idea.sh  注释掉组件节点配置

 

 

3)编译后,拷贝出jar文件,替换部署工程中streamset-lib中的jar组件

 

 

4)重启streamset系统后生效

9 修改记录

9.1 base-lib

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/coap/CoapClientDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/NullDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/StatsNullDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/devnull/ToErrorNullDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/http/HttpClientDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/localfilesystem/Groups.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/localfilesystem/LocalFileSystemDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/mqtt/MqttClientDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/recordstolocalfilesystem/ToErrorLocalFSDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/sdcipc/SdcIpcDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/toerror/ToErrorDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/destination/websocket/WebSocketDTarget.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/emailexecutor/EmailDExecutor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/finishpipeline/PipelineFinisherDExecutor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/executor/shell/ShellDExecutor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/coapserver/CoapServerDPushSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/http/HttpClientConfigBean.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/http/HttpClientDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/httpserver/Groups.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/httpserver/HttpServerDPushSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/logtail/FileTailDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/mqtt/MqttClientDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/opcua/OpcUaClientDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/Groups.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/RemoteDownloadConfigBean.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/remote/RemoteDownloadDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/sdcipc/SdcIpcDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/sdcipcwithbuffer/SdcIpcWithDiskBufferDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/spooldir/SpoolDirDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/tcp/TCPServerDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/udp/UDPDSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/origin/websocketserver/WebSocketServerDPushSource.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/base64/Base64DecodingDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/base64/Base64EncodingDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/dedup/DeDupDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/expression/ExpressionDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldfilter/FieldFilterDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldflattener/FieldFlattenerDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldhasher/FieldHasherDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldmask/FieldMaskDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldmerger/FieldMergerDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldorder/FieldOrderDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldrenamer/FieldRenamerDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldtypeconverter/FieldTypeConverterDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/fieldvaluereplacer/FieldValueReplacerDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/geolocation/GeolocationDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/http/HttpDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/javascript/JavaScriptDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/jsonparser/JsonParserDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/kv/local/LocalLookupDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/listpivot/ListPivotDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/logparser/LogParserDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/selector/SelectorDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/splitter/SplitterDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/xmlflattener/XMLFlatteningDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/xmlparser/XmlParserDProcessor.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipConfig.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipConfigBean.java

basic-lib/src/main/java/com/streamsets/pipeline/stage/processor/zip/FieldZipDProcessor.java

 

9.2 common-ui

common-ui/src/main/webapp/common/administration/logs/logs.tpl.html

common-ui/src/main/webapp/login.html

9.3 common-lib

commonlib/src/main/java/com/streamsets/pipeline/stage/origin/lib/BasicConfig.java

commonlib/src/main/java/com/streamsets/pipeline/stage/origin/lib/DataParserFormatConfig.java

 

9.4 container

container/src/main/java/com/streamsets/datacollector/config/PipelineGroups.java

container/src/main/java/com/streamsets/datacollector/creation/PipelineConfigBean.java

container/src/main/java/com/streamsets/datacollector/creation/RuleDefinitionsConfigBean.java

container/src/main/java/com/streamsets/datacollector/creation/StageConfigBean.java

container/src/main/java/com/streamsets/datacollector/validation/PipelineConfigurationValidator.java

 

9.5 datacollector-ui

datacollector-ui/src/main/webapp/app/app.js

datacollector-ui/src/main/webapp/app/common/pipelineService.js

datacollector-ui/src/main/webapp/app/home/detail/detail.tpl.html

datacollector-ui/src/main/webapp/app/home/header/header.tpl.html

datacollector-ui/src/main/webapp/app/home/home_grid_view.tpl.html

datacollector-ui/src/main/webapp/app/home/home_header.tpl.html

datacollector-ui/src/main/webapp/app/home/home_list_view.tpl.html

datacollector-ui/src/main/webapp/app/home/preview/preview.tpl.html

datacollector-ui/src/main/webapp/app/home/snapshot/snapshot.tpl.html

datacollector-ui/src/main/webapp/app/home/stageLibrary/stageLibrary.js

datacollector-ui/src/main/webapp/app/home/stageLibrary/stageLibrary.tpl.html

datacollector-ui/src/main/webapp/i18n/en.json

datacollector-ui/src/main/webapp/index.html

9.6 hdfs-protolib

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/Groups.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/HdfsDTarget.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/HdfsTargetConfigBean.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/Groups.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsActionsConfig.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsConnectionConfig.java

hdfs-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hdfs/metadataexecutor/HdfsMetadataDExecutor.java

 

9.7 hive-protolib

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/FieldMappingConfig.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/Groups.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HMSTargetConfigBean.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HiveDTarget.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/HiveMetastoreDTarget.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/Groups.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/HiveQueryDExecutor.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/destination/hive/queryexecutor/HiveQueryExecutorConfig.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/lib/hive/HiveConfigBean.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/DecimalDefaultsConfig.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/Errors.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/Groups.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/HiveMetadataDProcessor.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/HiveMetadataOutputStreams.java

hive-protolib/src/main/java/com/streamsets/pipeline/stage/processor/hive/PartitionConfig.java

 

9.8  jdbc-lib

jdbc-lib/src/main/java/com/streamsets/pipeline/lib/jdbc/HikariPoolConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/destination/jdbc/Groups.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/destination/jdbc/JdbcDTarget.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/executor/jdbc/JdbcQueryDExecutor.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/CommonSourceConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/CDCSourceConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/ChangeTypeValues.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/Groups.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/OracleCDCConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/OracleCDCDSource.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/cdc/oracle/StartValues.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/BatchTableStrategy.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/Groups.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcConfigBean.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcDSource.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableJdbcRunnable.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/origin/jdbc/table/TableOrderStrategy.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/processor/jdbclookup/JdbcLookupDProcessor.java

jdbc-lib/src/main/java/com/streamsets/pipeline/stage/processor/jdbctee/JdbcTeeDProcessor.java

 

10 使用案例

 (1) (1) 拷贝配置文件到部署服务器(ambari平台下载HDFS HIVEHCAT HIVE全下载)),解压找到core-site.xmlhdfs-site.xmlhive-site.xml

 

 

linux(安装streamsets节点)上 streamsets下新建config,将上面3xml文件放在新建文件夹下

 

(2)设置host

 

 

· 192.168.0.202    master.novalocalmaster

· 192.168.0.206    slave.novalocalslave

 

 

(1)  Oracle 采集数据到 hive

 

 

 

 

 

Jdbc连接字符jdbc:oracle:thin:@xxx.xxx.xxx.xxx:1521:databasename

 

 

 

0 是uuid 自身的具体值

 

 

 

 

用户oracle用户

表名匹配模式:可以使用表全名或者表前几个字符+%匹配。如果是使用全名,则数据采集只能采集一个表,如果使用%匹配,则可以匹配全部由前几个字符开头的所有表。如上图%代表匹配所有TEST开头的表。

 

  驱动oracle.jdbc.driver.OracleDriver

 

 

JDBC URLambar:8080登录界面,点击hive,复制HiveServer2 JDBC URL

JDBC 驱动名称org.apache.hive.jdbc.HiveDriver

Hadoop 配置目录:将上面配置的目录拷贝下来

数据库表达式:填写数据库名称

表名:可以使用随机名称$sdc:id,也可以使用原表名$record:attribute(‘jdbc.tables’)

分区:oracle数据传输到hive指定分区,分区名可以根据需要填写,分区类型有STRING,INT,BIGINT三种方式可以选择,分区表达值根据需要填写

方法一,$record:value('/Age') < 21 ? 0 : 0

      $record:value('/Age') < 21 ? 1 : 1

....依次将所需分区建好后执行truncate table tablename,导入数据即可

方法二,

     $record:value('/st')根据具体字段来设置、

如报错可能是因为字段设置错误,oracle字段是大写,hive字段是小写

对应的hadoop存储位置填写相同的字段

 

数据格式:建议avro序列化文件

 

Hadoop文件系统URLhdfs地址,hdfs:xxx.xxx.xxx.xxx:8020

Hadoop FS配置dfs.client.use.datanode.hostname

是否客户端应该使用DN的HostName,在连接DN时,默认是使用IP。

 

文件类型:可根据输入类型选择对应类型,如输入类型是text,输出也可以选择text.

文件前缀:可根据具体业务选择

文件后缀:同上,如图片可以添加.jpg等后缀

目录模板:目录保存位置

时区:可以根据具体需求选择相应时区

时间基准:可以根据具体需求选择相应时间基准

文件最大记录数:选择0是任务停止后提交事务,选择具体值x则每x调数据提交事务。

 

数据格式:根据输入选择相应的输出格式

 

JDBC URI:同hive处理器配置

 

 

 

(2)  ftp采集数据到hdfs

 

 

资源URL:填写对应的服务器IP,SFTP://xxx.xxx.xxx.xxx:22/tmp/xxx

文件名格式:填写*则抽取目录下所有文件,填写具体名称则抽取具体名称文件

最大批处理大小:可以根据实际服务器IO来配置

批处理等待时间:根据实际业务待定

 

数据格式:根据输入文件格式确定

压缩格式:根据需求待定

最大行长度:根据文件确定

使用自定义分隔符:根据需求待定,不建议使用

字符:根据文件确定是UTF-8还是GBK活着别的编码

 

 

oracle-hivehadoop文件系统目标源配置

 

 

 

 

输入格式:可以根据需求将输入文件选择转码

 

 

图片 - local

 

 

配置同前面相同,需要注意的是图片、音频、视频的格式Whole File(整个文件),同时输出目录要使用$record:value('/fileInfo/filename') /tmp/out/$record:value('file:fileName()')

,否则报错,


以上是关于streamsets安装部署的主要内容,如果未能解决你的问题,请参考以下文章

StreamSets学习系列之StreamSets的集群安装(图文详解)

StreamSets学习系列之StreamSets的Core Tarball方式安装(图文详解)

数据对接-ETL之StreamSet学习之旅一

streamsets 安装

streamsets docker 安装试用

streamsets 包管理