数据治理平台 Apache Atlas
Posted 耳东的编程手记
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据治理平台 Apache Atlas相关的知识,希望对你有一定的参考价值。
Atlas 源代码编译
下载源代码
$ git clone https://github.com/apache/atlas.git
$ git switch -c remotes/origin/branch-2.0
修改Maven默认仓库到阿里巴巴仓库
$ vim settings.xml
添加以下代码到 mirrors
<mirror>
<id>aliyunmaven</id>
<mirrorOf>*</mirrorOf>
<name>阿里云公共仓库</name>
<url>https://maven.aliyun.com/repository/public</url>
</mirror>
编译代码
$ mvn clean -DskipTests package -Pdist
8G内存 I7 -8500u CPU 编译结果
.........
[INFO] Apache Atlas Kafka Bridge 2.1.0-SNAPSHOT ........... SUCCESS [ 3.553 s]
[INFO] Apache Atlas classification updater 2.1.0-SNAPSHOT . SUCCESS [ 1.100 s]
[INFO] Apache Atlas Impala Hook API 2.1.0-SNAPSHOT ........ SUCCESS [ 0.305 s]
[INFO] Apache Atlas Impala Bridge Shim 2.1.0-SNAPSHOT ..... SUCCESS [ 0.296 s]
[INFO] Apache Atlas Impala Bridge 2.1.0-SNAPSHOT .......... SUCCESS [ 5.235 s]
[INFO] Apache Atlas Distribution 2.1.0-SNAPSHOT ........... SUCCESS [01:03 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:09 min
安装包路径
atlas\distro\target\apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz
Hadoop 2.6.0 安装配置
下载hadoop
$ mkdir -p /opt/bigdata/hadoop-data
$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
$ tar -xzvf hadoop-2.6.0.tar.gz
修改配置文件(standard along)
2.1 修改 core-site.xml
$ vim /opt/bigdata/hadoop-2.6.0/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2.2 修改 hdfs-site.xml
$ vim /opt/bigdata/hadoop-2.6.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp/dfs/data</value>
</property>
</configuration>
2.3 创建name node 分区
$ hadoop-2.6.0/bin/hdfs namenode -format
2.4 启动hadoop组件
$ hadoop-2.6.0/sbin/start-all.sh
hbase 2.1.6 安装配置
下载hbase
$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/hbase/2.1.6/hbase-2.1.6-bin.tar.gz
$ tar -xzvf hbase-2.1.6-bin.tar.gz
修改配置
2.1 修改hbase-env.sh
$ vim hbase-2.1.6/conf/hbase-env.sh
修改 export JAVA_HOME=<JAVA_HOME>
修改 export HBASE_MANAGES_ZK=false
2.2 修改hbase-site.xml
$ vim hbase-2.1.6/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost:2181</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>61510</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>61530</value>
</property>
<property>
<name>hbase.master.port</name>
<value>61500</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>61520</value>
</property>
</configuration>
2.3 启动Hbase
$ ./hbase-2.1.6/bin/start-hbase.sh
Apache Solr 安装配置
下载Solr
$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/lucene/solr/7.5.0/solr-7.5.0.tgz
$ tar -xzvf solr-7.5.0.tgz
2.启动solr
$ ./solr-7.5.0/bin/solr start -c -m 1g -z localhost:2181
添加collections
3.1 访问 http://<ip-address>:8983
3.2 添加以下Collections
fulltext_index
edge_index
vertex_index
Apache Atlas 安装配置
上传Atlas
$scp atlas\distro\target\apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz <your account>@<ip-address>:/opt/bigdata
$tar -xvf /opt/bigdata/apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz
修改配置
2.1 修改 atlas-env.sh
# indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false
# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false
# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false
# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export HBASE_CONF_DIR=/opt/bigdata/apache-atlas-2.1.0-SNAPSHOT-bin/conf/hbase
2.2 修改atlas-application.properties
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=localhost:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true
2.3 复制Hbase配置文件
$cp hbase-2.1.6/conf/hbase-site.xml apache-atlas-2.1.0-SNAPSHOT-bin/conf/hbase/
2.3 启动Atlas
$ ./ apache-atlas-2.1.0-SNAPSHOT-bin/atlas_start.py
2.4 访问Atlas
http://<ip-address>:21000
以上是关于数据治理平台 Apache Atlas的主要内容,如果未能解决你的问题,请参考以下文章