数据治理平台 Apache Atlas

Posted 耳东的编程手记

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据治理平台 Apache Atlas相关的知识,希望对你有一定的参考价值。

Atlas 源代码编译

  1. 下载源代码

$ git clone https://github.com/apache/atlas.git
$ git switch -c remotes/origin/branch-2.0
  1. 修改Maven默认仓库到阿里巴巴仓库

$ vim settings.xml
  1. 添加以下代码到 mirrors

<mirror>
<id>aliyunmaven</id>
<mirrorOf>*</mirrorOf>
<name>阿里云公共仓库</name>
<url>https://maven.aliyun.com/repository/public</url>
</mirror>
  1. 编译代码

$ mvn clean -DskipTests package -Pdist
  1. 8G内存 I7 -8500u CPU 编译结果

.........
[INFO] Apache Atlas Kafka Bridge 2.1.0-SNAPSHOT ........... SUCCESS [ 3.553 s]
[INFO] Apache Atlas classification updater 2.1.0-SNAPSHOT . SUCCESS [ 1.100 s]
[INFO] Apache Atlas Impala Hook API 2.1.0-SNAPSHOT ........ SUCCESS [ 0.305 s]
[INFO] Apache Atlas Impala Bridge Shim 2.1.0-SNAPSHOT ..... SUCCESS [ 0.296 s]
[INFO] Apache Atlas Impala Bridge 2.1.0-SNAPSHOT .......... SUCCESS [ 5.235 s]
[INFO] Apache Atlas Distribution 2.1.0-SNAPSHOT ........... SUCCESS [01:03 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10:09 min
  1. 安装包路径

atlas\distro\target\apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz

Hadoop 2.6.0 安装配置

  1. 下载hadoop

$ mkdir -p /opt/bigdata/hadoop-data
$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
$ tar -xzvf hadoop-2.6.0.tar.gz
  1. 修改配置文件(standard along)

2.1 修改 core-site.xml

$ vim /opt/bigdata/hadoop-2.6.0/etc/hadoop/core-site.xml

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

2.2 修改 hdfs-site.xml

$ vim /opt/bigdata/hadoop-2.6.0/etc/hadoop/hdfs-site.xml


<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/bigdata/hadoop-data/tmp/dfs/data</value>
</property>
</configuration>

2.3 创建name node 分区

$ hadoop-2.6.0/bin/hdfs namenode -format

2.4 启动hadoop组件

$ hadoop-2.6.0/sbin/start-all.sh

hbase 2.1.6 安装配置

  1. 下载hbase

$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/hbase/2.1.6/hbase-2.1.6-bin.tar.gz
$ tar -xzvf hbase-2.1.6-bin.tar.gz
  1. 修改配置

2.1 修改hbase-env.sh

$ vim hbase-2.1.6/conf/hbase-env.sh
修改 export JAVA_HOME=<JAVA_HOME>
修改 export HBASE_MANAGES_ZK=false

2.2 修改hbase-site.xml

$ vim hbase-2.1.6/conf/hbase-site.xml

<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost:2181</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>61510</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>61530</value>
</property>
<property>
<name>hbase.master.port</name>
<value>61500</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>61520</value>
</property>
</configuration>

2.3 启动Hbase

$ ./hbase-2.1.6/bin/start-hbase.sh

Apache Solr 安装配置

  1. 下载Solr

$ cd /opt/bigdata
$ wget https://archive.apache.org/dist/lucene/solr/7.5.0/solr-7.5.0.tgz
$ tar -xzvf solr-7.5.0.tgz

2.启动solr

$ ./solr-7.5.0/bin/solr start -c -m 1g -z localhost:2181
  1. 添加collections
    3.1 访问 http://<ip-address>:8983
    3.2 添加以下Collections
    fulltext_index
    edge_index
    vertex_index

Apache Atlas 安装配置

  1. 上传Atlas

$scp atlas\distro\target\apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz <your account>@<ip-address>:/opt/bigdata
$tar -xvf /opt/bigdata/apache-atlas-2.1.0-SNAPSHOT-bin.tar.gz
  1. 修改配置

2.1 修改 atlas-env.sh

# indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false

export HBASE_CONF_DIR=/opt/bigdata/apache-atlas-2.1.0-SNAPSHOT-bin/conf/hbase

2.2 修改atlas-application.properties

#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=localhost:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

2.3 复制Hbase配置文件

$cp hbase-2.1.6/conf/hbase-site.xml apache-atlas-2.1.0-SNAPSHOT-bin/conf/hbase/

2.3 启动Atlas

$ ./ apache-atlas-2.1.0-SNAPSHOT-bin/atlas_start.py

2.4 访问Atlas
http://<ip-address>:21000


以上是关于数据治理平台 Apache Atlas的主要内容,如果未能解决你的问题,请参考以下文章

大数据治理系统框架Apache Atlas实践

Atlas|开始认识Apache Atlas

数据治理:编译Atlas安装包

Apache atlas 初体验

apache-atlas完整安装教程-离线安装

大数据之数据治理架构 —— Atlas