案例:华为FusionInsight 大数据平台集成 atlas1.0.0 ,环境Centos7
Posted 袁义锐
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了案例:华为FusionInsight 大数据平台集成 atlas1.0.0 ,环境Centos7相关的知识,希望对你有一定的参考价值。
本案例在使用了编译好的atlas 版本号1.0.0 ,本文重在讲解如何集成;编译atlas,可以在其他博客中找一下
1.华为FusionInsight 大数据平台 自带的solr版本和atlas版本(1.0.0)存在不兼容的情况;我为了赶工期,没有特意深究,自己搭建了solr cloud 模式,博客可以参考:https://blog.csdn.net/u010235716/article/details/104946962
2. 准备好jdk环境变量;
-
安装
- 1.上传解压 编译好的atlas-1.0.0.zip到 /data 目录,并解压文件
#解压
unzip atlas-1.0.0.zip
- 2.进入 cd /data/atlas-1.0.0/distro/target/ 目录,解压apache-atlas-1.0.0-bin.tar.gz文件
[root@SHB-L0120794 atlas-1.0.0]# cd /data/atlas-1.0.0/distro/target/
[root@SHB-L0120794 target]# ll
total 374316
drwxr-xr-x 11 root root 4096 Mar 17 12:11 apache-atlas-1.0.0
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-bin.bak
-rw-r--r-- 1 root root 269882584 Nov 12 2018 apache-atlas-1.0.0-bin.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-falcon-hook
-rw-r--r-- 1 root root 8984504 Nov 12 2018 apache-atlas-1.0.0-falcon-hook.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-hbase-hook
-rw-r--r-- 1 root root 16618230 Nov 12 2018 apache-atlas-1.0.0-hbase-hook.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-hive-hook
-rw-r--r-- 1 root root 20269877 Nov 12 2018 apache-atlas-1.0.0-hive-hook.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-kafka-hook
-rw-r--r-- 1 root root 9021206 Nov 12 2018 apache-atlas-1.0.0-kafka-hook.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-migration-exporter
-rw-r--r-- 1 root root 5696 Nov 12 2018 apache-atlas-1.0.0-migration-exporter.zip
-rw-r--r-- 1 root root 10349836 Nov 12 2018 apache-atlas-1.0.0-sources.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-sqoop-hook
-rw-r--r-- 1 root root 8969601 Nov 12 2018 apache-atlas-1.0.0-sqoop-hook.tar.gz
drwxr-xr-x 3 root root 4096 Nov 12 2018 apache-atlas-1.0.0-storm-hook
-rw-r--r-- 1 root root 39003696 Nov 12 2018 apache-atlas-1.0.0-storm-hook.tar.gz
drwxr-xr-x 2 root root 4096 Nov 12 2018 archive-tmp
-rw-r--r-- 1 root root 94839 Nov 12 2018 atlas-distro-1.0.0.jar
drwxr-xr-x 2 root root 4096 Nov 12 2018 bin
drwxr-xr-x 5 root root 4096 Nov 12 2018 conf
drwxr-xr-x 2 root root 4096 Nov 12 2018 maven-archiver
drwxr-xr-x 3 root root 4096 Nov 12 2018 maven-shared-archive-resources
drwxr-xr-x 2 root root 4096 Nov 12 2018 META-INF
-rw-r--r-- 1 root root 3493 Nov 12 2018 rat.txt
drwxr-xr-x 3 root root 4096 Nov 12 2018 test-classes
- 3.集成hbase
- 3.1 编辑atlas配置文件
#编辑
vim /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/atlas-application.properties
#配置存储
atlas.graph.storage.backend=hbase
#配置表名
atlas.graph.storage.hbase.table=atlas
#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
#配置主机名称
atlas.graph.storage.hostname=shb-l0120794,shb-l0120795,shb-l0120796
######### Entity Audit Configs #########
#该表会在hbase中被自动创建
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
#配置zk主机
atlas.audit.hbase.zookeeper.quorum=shb-l0120794,shb-l0120795,shb-l0120796
- 3.2 把HBase安装路径中,/conf目录全部配置文件放置到atlas的hbase目录: /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/hbase;这里需要在华为web UI界面下载安装后操作
或者将如下hbase安装目录下的全部文件,复制到atlas的hbase目录
/opt/huawei/Bigdata/FusionInsight_HD_V100R002C70SPC200/1_27_RegionServer/etc
- 3.3 修改/conf 目录下的atlas-env.sh文件,导入hbase路径
export HBASE_CONF_DIR=/data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/hbase
- 4.集成solr
- 4.1 将/conf/solr文件夹,拷贝到solr5.5.1的安装目录中,
[root@SHB-L0120794 conf]# pwd
/data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf
[root@SHB-L0120794 conf]# ll
total 88
-rw-r--r-- 1 root root 12319 Mar 19 21:20 atlas-application.properties
-rw-r--r-- 1 root root 3353 Mar 17 13:45 atlas-env.sh
-rw-r--r-- 1 root root 5164 Mar 19 20:30 atlas-log4j.xml
-rw-r--r-- 1 root root 5156 Mar 19 20:28 atlas-log4j.xml.info
-rw-r--r-- 1 root root 1459 Nov 9 2018 atlas-simple-authz-policy.json
-rw-r--r-- 1 root root 31403 Nov 9 2018 cassandra.yml.template
drwxr-xr-x 2 root root 4096 Mar 19 20:16 hbase
drwxr-xr-x 3 root root 4096 Mar 17 12:01 solr
-rw-r--r-- 1 root root 207 Nov 9 2018 users-credentials.properties
drwxr-xr-x 2 root root 4096 Mar 17 12:01 zookeeper
[root@SHB-L0120794 conf]#
重命名solr文件夹为apache-atlas-conf
[root@SHB-L0120794 solr5.5_1]# pwd
/data/solrcloud/solr5.5_1
[root@SHB-L0120794 solr5.5_1]# ll
total 1240
drwxr-xr-x 3 root root 4096 Mar 18 10:53 apache-atlas-conf
drwxr-xr-x 3 root root 4096 Mar 18 12:21 bin
-rw-r--r-- 1 root root 555321 May 1 2016 CHANGES.txt
drwxr-xr-x 13 root root 4096 May 1 2016 contrib
drwxr-xr-x 4 root root 4096 Mar 17 14:53 dist
drwxr-xr-x 19 root root 4096 Mar 17 14:53 docs
drwxr-xr-x 7 root root 4096 Mar 17 14:53 example
drwxr-xr-x 2 root root 36864 Mar 17 14:53 licenses
-rw-r--r-- 1 root root 12646 Feb 1 2016 LICENSE.txt
-rw-r--r-- 1 root root 590277 May 1 2016 LUCENE_CHANGES.txt
-rw-r--r-- 1 root root 26529 Feb 1 2016 NOTICE.txt
-rw-r--r-- 1 root root 7162 May 1 2016 README.txt
drwxr-xr-x 11 root root 4096 Mar 17 14:53 server
[root@SHB-L0120794 solr5.5_1]#
- 4.2 配置atlas
#编辑
vim /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/atlas-application.properties
#配置项目
# Graph Search Index
#配置查询索引
atlas.graph.index.search.backend=solr
#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
#配置solr zk
atlas.graph.index.search.solr.zookeeper-url=shb-l0120794:24002,shb-l0120795:24002,shb-l0120796:24002
- 4.3 创建 solr collection.可以在solr web UI的cloud中验证
bash /data/solrcloud/solr5.5_1/bin/solr create -c vertex_index -d /data/solrcloud/solr5.5_1/apache-atlas-conf -shards 2 -replicationFactor 2
bash /data/solrcloud/solr5.5_1/bin/solr create -c edge_index -d /data/solrcloud/solr5.5_1/apache-atlas-conf -shards 2 -replicationFactor 2
bash /data/solrcloud/solr5.5_1/bin/solr create -c fulltext_index -d /data/solrcloud/solr5.5_1/apache-atlas-conf -shards 2 -replicationFactor 2
- 5. 集成kafka
#编辑
vim /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/atlas-application.properties
#配置
######### Notification Configs #########
#默认true
atlas.notification.embedded=false
atlas.kafka.data=$sys:atlas.home/data/kafka
#配置kafka在zk目录位置
atlas.kafka.zookeeper.connect=shb-l0120794:24002,shb-l0120795:24002,shb-l0120796:24002/kafka
#配置kafka所在主机和端口
atlas.kafka.bootstrap.servers=shb-l0120794:21005,shb-l0120795:21005,shb-l0120796:21005
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas
atlas.kafka.enable.auto.commit=false
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000
atlas.notification.create.topics=true
atlas.notification.replicas=1
#如下topic会自动在kafka中被创建
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
- 6.配置hive hook
- 6.1 配置atlas 扫描的集群名称
#编辑
vim /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/atlas-application.properties
#配置 修改集群名称和华为大数据集成名称一致:非常重要
atlas.cluster.name=hacluster
- 6.2 配置hive所有的安装节点,在hive-site.xml中添加hook配置项
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
- 6.3 复制atlas-application.properties文件到hive安装目录下的/conf目录
[root@SHB-L0120795 metastore]# cd /opt/huawei/Bigdata/FusionInsight_HD_V100R002C70SPC200/install/FusionInsight-Hive-1.3.0/hive-1.3.0/conf
[root@SHB-L0120795 conf]# ll
total 48
-rw-r--r-- 1 root root 12313 Mar 19 20:58 atlas-application.properties
-rw------- 1 omm wheel 1139 Nov 8 2017 beeline-log4j.properties.template
-rw------- 1 omm wheel 3454 Nov 8 2017 gc-opts.sh
-rw------- 1 omm wheel 2662 Nov 8 2017 hive-exec-log4j.properties.template
-rw------- 1 omm wheel 3050 Nov 8 2017 hive-log4j.properties.template
[root@SHB-L0120795 conf]#
- 7. 启动atlas服务
执行命令bin/atlas_start.py启动atlas服务
cd /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0
#启动
bin/atlas_start.py
http://IP:21000/login.jsp 登陆验证, 用户名:admin 密码:admin
-
部分问题解决
- 问题1. Failed to identify the fs of dir hdfs://hacluster/hbase/lib
2020-03-19 20:04:18,742 WARN - [main:] ~ Failed to identify the fs of dir hdfs://hacluster/hbase/lib, ignored (DynamicClassLoader:106)
java.io.IOException: Couldn't create proxy provider null
at org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515)
解决方案:hdfs://hacluster名称 为华为大数据集群环境名,需要绑定到atlas中去
#编辑
vim /data/atlas-1.0.0/distro/target/apache-atlas-1.0.0/conf/atlas-application.properties
#配置 修改集群名称和华为大数据集成名称一致:非常重要
atlas.cluster.name=hacluster
- 问题2:BlackListingFailoverProxyProvider not found
Class org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider not found
解决方案:webUI界面 服务管理 - HDFS - 服务配置 - 全部配置 :搜索:BlackListingFailoverProxyProvider 切换到
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
- 问题3:kafka配置错误;
2020-03-19 20:04:27,835 ERROR - [main:] ~ Exception in getKafkaConsumer (KafkaNotification:236)
org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:781)
解决方案:在zk的管理目录,注意kafka的层级结构
#配置kafka在zk目录位置
atlas.kafka.zookeeper.connect=shb-l0120794:24002,shb-l0120795:24002,shb-l0120796:24002/kafka
以上是关于案例:华为FusionInsight 大数据平台集成 atlas1.0.0 ,环境Centos7的主要内容,如果未能解决你的问题,请参考以下文章
华为云FusionInsight连续三次获得第一,加速释放数据要素价值
案例:华为FusionInsight_HD 低配版(三台)安装实例 -- 手动安装
解密华为云FusionInsight MRS新特性:一架构三湖
解密华为云FusionInsight MRS新特性:一架构三湖