markdown CentOS7.2安装Cloudera5.7.6

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了markdown CentOS7.2安装Cloudera5.7.6相关的知识,希望对你有一定的参考价值。

# Cloudera常用链接
### cloudera安装文档 [PDF](https://www.cloudera.com/documentation/enterprise/latest/PDF/cloudera-installation.pdf) [HTML](https://www.cloudera.com/documentation/enterprise/latest/topics/installation.html)
### cloudera管理文档 [PDF](https://www.cloudera.com/documentation/enterprise/latest/PDF/cloudera-administration.pdf) [HTML](https://www.cloudera.com/documentation/enterprise/latest/topics/administration.html)

# 一、CentOS7.2 系统设置(所有集群内主机都需要设置)
## 1. 关闭SELinux
用`getenforce`命令检查SELinux是否已禁用
```zsh
$ getenforce
Disabled
```
修改SELinux配置文件
```zsh
$ sudo vim /etc/selinux/config
SELINUX=disabled
```

## 2. 关闭防火墙
```zsh
$ sudo systemctl stop firewalld
$ sudo systemctl disable firewalld
```

## 3. 修改hosts文件和hostname文件
此文件必须群集内所有主机都一致,可以在master主机上配置好,然后scp到其他slave主机
```zsh
$ sudo vim /etc/hosts
```
```
192.168.31.160   master
192.168.31.161   slave1
192.168.31.162   slave2
```
```zsh
$ sudo scp /etc/hosts slave1:/etc/hosts
$ sudo scp /etc/hosts slave2:/etc/hosts

# 确保hostname命令的的主机名与hosts中本机的主机名一致
$ sudo vim /etc/hostname
master

$ hostnamectl
```

## 4. 设置静态IP
```zsh
sudo vim /etc/sysconfig/network-scripts/ifcfg-eno
```
```
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=192.168.31.160
GATEWAY=192.168.31.1
DNS1=192.168.31.1
```

## 5. 设置时间同步
```zsh
$ sudo yum install -y ntp
$ sudo systemctl enable ntpd
$ sudo systemctl enable ntpdate
$ sudo vim /etc/ntp.conf
server time1.aliyun.com

$ sudo ntpdate time1.aliyun.com
$ timedatectl
```

## 6. 安装CDH支持的oracle jdk
卸载系统自带的openjdk
```zsh
$ rpm -qa | grep --color openjdk
$ sudo yum remove -y java-1.7.0-openjdk-headless.x86_64 java-1.7.0-openjdk.x86_64 java-1.8.0-openjdk-headless.x86_64 java-1.8.0-openjdk.x86_64
```
从[oracle](http://www.oracle.com/technetwork/java/javase/archive-139210.html)下载jdk并安装
```zsh
# 安装oracle jdk1.8
$ sudo yum install -y jdk-8u144-linux-x64.rpm
```

## 7. 调整内核参数
```zsh
$ sudo sysctl vm.swappiness=0
$ sudo vim /etc/sysctl.conf
vm.swappiness=0

# 使参数生效
$ sudo sysctl -p

# CentOS7.2需要修改/usr/lib/tuned下面的文件,否则开机会动态调整vm.swappiness参数。
$ grep -R 'vm.swappiness' *
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30

# 修改virtual-guest/tuned.conf中的参数
$ sudo vim /usr/lib/tuned/virtual-guest/tuned.conf
vm.swappiness=0
```

## 8. 禁止透明大页面预先分配
```zsh
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ sudo vim /etc/rc.local
```
```
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
```
```zsh
# /etc/rc.local是/etc/rc.d/rc.local的符号链接,修改rc.local为可执行
$ sudo chmod +x /etc/rc.d/rc.local
```

## 9. 重启机器
```zsh
$ sudo reboot
```

# 二、安装Cloudera Manager Server的主机设置
## 0. 下载CM安装所需RPM文件和parcel文件
- ~~从[CM Archive](https://archive.cloudera.com/cm5/installer/)下载cloudera-manager-installer.bin文件~~
- 从[CM Archive](http://archive.cloudera.com/cm5/repo-as-tarball/5.7.6/)下载CM5.7.6的tar压缩文件(包含所有RPM)
- 从[CDH Archive](http://archive.cloudera.com/cdh5/parcels/5.7.6/)下载对应操作系统版本的parcel文件,共有三个文件, CentOS7.2对应文件是:
    - [CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel](http://archive.cloudera.com/cdh5/parcels/5.7.6/CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel)
    - [CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha1](http://archive.cloudera.com/cdh5/parcels/5.7.6/CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha1)
    - [manifest.json](http://archive.cloudera.com/cdh5/parcels/5.7.6/manifest.json)
- 文件下载后,将CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha1重命名为CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha
```zsh
$ mv CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha{1,}
```
- [cloudera官方文档](https://www.cloudera.com/documentation.html)

## 1. 为yum源添加cloudera-manager.repo文件
从[CM Archive](http://archive.cloudera.com/cm5/redhat/7/x86_64/cm)下载[cloudera-manager.repo](http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo)文件,修改里面的baseurl对应到你所安装的版本(我这里的版本是5.7.6),同时把`gpgcheck=1`改为`gpgcheck=0`,如果不修改的话,cloudera-manager-installer.bin安装时会自动把已经安装好的cloudera rpm包在线升级到最新版本,gpgkey那行可以删掉。
```zsh
$ vim cloudera-manager.repo
$ sudo cp cloudera-manager.repo /etc/yum.repos.d
```
```
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64           	  
name = Cloudera Manager
baseurl = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.7.6/
gpgcheck = 0
```
检查在yum源是否可以找到cloudera相关的包
```zsh
$ yum list | grep cloudera
```

## 2. 将parcel文件放入/opt/cloudera/parcel-repo
将下载好的CDH文件(parcel、parcel.sha、manifest.json)移到/opt/cloudera/parcel-repo目录,如果此步没做,在Cloudera Manager进行群集安装时,系统会去网上下载parcel文件,此文件大小在1.4GB左右
```zsh
$ sudo mkdir -p /opt/cloudera
$ sudo mv ~/cdh /opt/cloudera/parcel-repo
```

## 3. 安装Cloudera Manager的所有RPM
解压下载好的CM5.7.6压缩包
```zsh
$ tar xvzf cm5.7.6-centos7.x86_64
```
进入解压后的cm目录,找到rpm文件,然后使用yum安装,yum会自动安装相关依赖包
```zsh
$ cd cm/5/RPMS/x86_64
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-agent-*.rpm cloudera-manager-server-*.rpm cloudera-manager-daemons-*.rpm
```
**注意:**如果不使用内置的PostgreSQL数据库,则不需要安装cloudera-manager-server-db的RPM包。


## ~~4. 删除db.properties文件~~
这里不使用内置数据库  
~~$ sudo rm -f /etc/cloudera-scm-server/db.properties~~


## ~~5. 执行installer.bin安装文件~~
~~如果前面的RPMS包都已安装,并且cloudera-manager.repo文件配置正确,则这一步会很快完成(1分钟左右)~~  
~~$ sudo ./cloudera-manager-installer.bin~~

## 6. 查看Cloudera Manager的服务状态
```zsh
$ sudo service --status-all
```

## 7. 如果某个Cloudera服务没启动,就重启一下该服务
不使用内置数据库,则不用执行  
~~$ sudo systemctl restart cloudera-scm-server-db~~
```zsh
$ sudo systemctl restart cloudera-scm-server
$ sudo systemctl restart cloudera-scm-agent
```

## 8. 查看7180端口是否打开
Cloudera Manager Server使用7180端口,重启服务后要等几分钟(有时候需要5分钟左右)才能看到7180端口
```zsh
$ watch sudo netstat -tulpn
```
使用浏览器访问Master服务器的ip:7180,就可以进入Cloudera Manager的Web配置界面

  
# 三、集群中其它主机上安装Cloudera Manager Agent
1. 为yum源添加cloudera repo文件,内容与Master主机一样
2. 只安装cloudera-manager-agent和cloudera-manager-daemons两个RPM包
```zsh
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-{agent,daemons}-*.rpm
```

# 四、主机角色分配
- **Master hosts**:运行Hadoop的主要进程,例如HDFS NameNode和YARN Resource Manager.
- **Utility hosts**:运行集群中的非主要进程,例如Cloudera Manager和Hive Metastore
- **Edge hosts**:一般作为集群中客户端的访问节点来启动一些任务。
- **Worker hosts**:主要运行DataNodes和其它一些分布式进程,如Impalad。

集群规模|Master hosts|Utility hosts|Edge hosts|Worker hosts
--------|------------|-------------|----------|------------
小规模  |<li>NameNode<li>YARN ResourceManager<li>JobHistory Server<li>ZooKeeper<li>Impala StateStore<li>Kudu Master|<li>Secondary NameNode <li>Cloudera Manager <li>Cloudera Manager Management Service <li>Hive Metastore <li>HiveServer2 <li>Impala Catalog <li>Hue <li>Oozie <li>Flume <li>Gateway configuration | |<li>DataNode <li>NodeManager <li>Impalad <li>Kudu tablet server

# 五、数据库配置
[官方数据库设置文档](https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_installing_configuring_dbs.html)  
## 1、安装MariaDB数据库
- 查看CDH版本支持的MariaDB数据库版本(这里选择10.2版本)
- 设置MariaDB

```zsh
# 移除旧的InnoDB日志文件
$ sudo service mariadb stop
$ mv /var/lib/mysql/ib_logfile{0,1} /tmp
$ sudo vim /etc/my.cnf.d/server.cnf
```
```
[mysqld]
sql_mode=STRICT_ALL_TABLES

transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0

key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1

max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M

#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log

binlog_format = mixed

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M

# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
```
从[MYSQL官方](https://dev.mysql.com/downloads/connector/j/)下载mysql的jdbc,在所有需要连接MariaDB的主机上复制一份到/usr/share/java/mysql-connector-java.jar

## 2、需要数据库的服务
服务名|说明
------|----
Cloudera Manager | Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up.
Oozie Server | Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
Sqoop Server | Contains entities such as the connector, driver, links and jobs. Relatively small.
Activity Monitor | Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
Reports Manager | Tracks disk utilization and processing activities over time. Medium-sized.
Hive Metastore Server | Contains Hive metadata. Relatively small.
Hue Server | Contains user account information, job submissions, and Hive queries. Relatively small.
Sentry Server | Contains authorization metadata. Relatively small.
Cloudera Navigator Audit Server | Contains auditing information. In large clusters, this database can grow large.
Cloudera Navigator Metadata Server | Contains authorization, policies, and audit report metadata. Relatively small.

## 3、创建Cloudera Manager数据库
```console
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h <mysql-server> -u root -p[password] --scm-host <cm-server> scm scm scm
```

## 4、根据需要创建以下数据库
角色|数据库名|用户名|密码
----|--------|------|----
Activity Monitor(如果使用MapReduce服务)|amon|amon|amon
Reports Manager|rman|rman|rman
Hive Metastore Server|metastore|hive|hive
Sentry Server|sentry|sentry|sentry
Cloudera Navigator Audit Server|nav|nav|nav
Cloudera Navigator Metadata Server|navms|navms|navms
```zsh
# 连入mysql
mysql -u root -p
```
```sql
-- 创建aman数据库
create database amon default character set utf8;
grant all on amon.* to 'amon'@'%' identified by 'amon';

-- 创建rman数据库
create database rman default character set utf8;
grant all on rman.* to 'rman'@'%' identified by 'rman';

-- 创建hive数据库
create database metastore default character set utf8;
grant all on metastore.* to 'hive'@'%' identified by 'hive';
```

## 5、创建Oozie数据库
```sql
create database oozie default character set utf8;
grant all on oozie.* to 'oozie'@'localhost' identified by 'oozie';
grant all on oozie.* to 'oozie'@'%' identified by 'oozie';
```
复制mysql jdbc文件到/opt/cloudera/parcels/CDH/lib/ooize/lib

## 6、创建Hue数据库
```sql
create database hue default character set utf8 default collate utf8_general_ci;
grant all on hue.* to 'hue'@'%' identified by 'hue';
select * from information_schema.schemata;
```

以上是关于markdown CentOS7.2安装Cloudera5.7.6的主要内容,如果未能解决你的问题,请参考以下文章

markdown Google Cloud基础知识

markdown Ubuntu Cloud

markdown Docker登录Google Cloud

markdown 通过IBM Cloud CLI订购ICOS

markdown UNAL - Servicios QGIS Cloud y WMTS

markdown Git Cloud上的新项目(GitHub / VSOnline)