大数据数据仓库环境准备

Posted 2023-03-29 DB架构

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了大数据数据仓库环境准备相关的知识，希望对你有一定的参考价值。

1.Hive 安装部署

1）把apache-hive-3.1.2-bin.tar.gz上传到linux的/opt/software目录下

2）解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面

[maxwell@hadoop102 software]$ tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/

3）修改apache-hive-3.1.2-bin.tar.gz的名称为hive

[maxwell@hadoop102 software]$ cd /opt/module/
[maxwell@hadoop102 module]$ ls -ltr
total 0
drwxr-xr-x.  7 maxwell maxwell 245 Apr  2  2019 jdk1.8.0_212
drwxrwxr-x.  3 maxwell maxwell  18 Mar 22 23:20 hadoop-3.1.3
drwxr-xr-x. 13 maxwell maxwell 204 Mar 23 11:02 hadoop
drwxrwxr-x.  8 maxwell maxwell 160 Mar 23 14:50 zookeeper
drwxr-xr-x.  9 maxwell maxwell 130 Mar 23 15:23 kafka
drwxrwxr-x.  9 maxwell maxwell 239 Mar 25 13:30 flume
drwxrwxr-x.  2 maxwell maxwell  76 Mar 25 21:58 db_log
drwxrwxr-x.  3 maxwell maxwell 117 Mar 27 17:14 applog
drwxr-xr-x. 11 maxwell maxwell 117 Mar 28 12:56 datax
drwxrwxr-x.  5 maxwell maxwell 237 Mar 29 09:25 maxwell
drwxrwxr-x.  9 maxwell maxwell 153 Mar 29 10:05 apache-hive-3.1.2-bin
[maxwell@hadoop102 module]$ mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive
[maxwell@hadoop102 module]$ ls -ltr
total 0
drwxr-xr-x.  7 maxwell maxwell 245 Apr  2  2019 jdk1.8.0_212
drwxrwxr-x.  3 maxwell maxwell  18 Mar 22 23:20 hadoop-3.1.3
drwxr-xr-x. 13 maxwell maxwell 204 Mar 23 11:02 hadoop
drwxrwxr-x.  8 maxwell maxwell 160 Mar 23 14:50 zookeeper
drwxr-xr-x.  9 maxwell maxwell 130 Mar 23 15:23 kafka
drwxrwxr-x.  9 maxwell maxwell 239 Mar 25 13:30 flume
drwxrwxr-x.  2 maxwell maxwell  76 Mar 25 21:58 db_log
drwxrwxr-x.  3 maxwell maxwell 117 Mar 27 17:14 applog
drwxr-xr-x. 11 maxwell maxwell 117 Mar 28 12:56 datax
drwxrwxr-x.  5 maxwell maxwell 237 Mar 29 09:25 maxwell
drwxrwxr-x.  9 maxwell maxwell 153 Mar 29 10:05 hive
[maxwell@hadoop102 module]$

4）修改/etc/profile.d/my_env.sh，添加环境变量

[maxwell@hadoop102 ~]$ cd /opt/software/
[maxwell@hadoop102 software]$ 
[maxwell@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
[maxwell@hadoop102 software]$

添加内容

#HIVE_HOME

export HIVE_HOME=/opt/module/hive

export PATH=$PATH:$HIVE_HOME/bin

重启Xshell对话框或者source一下 /etc/profile.d/my_env.sh文件，使环境变量生效

[maxwell@hadoop102 software]$ 
[maxwell@hadoop102 software]$ source /etc/profile.d/my_env.sh
[maxwell@hadoop102 software]$

5）解决日志Jar包冲突，进入/opt/module/hive/lib目录

[maxwell@hadoop102 lib]$ ls -ltr log4j-slf4j-impl-2.10.0.jar 
-rw-rw-r--. 1 maxwell maxwell 24173 Apr 15  2020 log4j-slf4j-impl-2.10.0.jar
[maxwell@hadoop102 lib]$ pwd
/opt/module/hive/lib
[maxwell@hadoop102 lib]$ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
[maxwell@hadoop102 lib]$ ls -ltr log4j-slf4j-impl-2.10.0.jar.bak
-rw-rw-r--. 1 maxwell maxwell 24173 Apr 15  2020 log4j-slf4j-impl-2.10.0.jar.bak
[maxwell@hadoop102 lib]$

2.Hive 元数据配置到mysql

2.1 拷贝驱动

将MySQL的JDBC驱动拷贝到Hive的lib目录下

[maxwell@hadoop102 lib]$ cp /opt/software/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
[maxwell@hadoop102 lib]$

2.2 配置Metastore到MySQL

在$HIVE_HOME/conf目录下新建hive-site.xml文件

添加如下内容

[maxwell@hadoop102 conf]$ vim hive-site.xml
[maxwell@hadoop102 conf]$ cat hive-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>centos123</value>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>

    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>
</configuration>
[maxwell@hadoop102 conf]$ pwd
/opt/module/hive/conf
[maxwell@hadoop102 conf]$

3 启动Hive

3.1 初始化元数据库

1）登陆MySQL

[maxwell@hadoop102 conf]$ mysql -uroot -pcentos123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 342
Server version: 5.7.16-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql>

2）新建Hive元数据库

mysql> 
mysql> create database metastore;
Query OK, 1 row affected (0.00 sec)

mysql>

3）初始化Hive元数据库

[maxwell@hadoop102 conf]$ schematool -initSchema -dbType mysql -verbose

4）修改元数据库字符集

Hive元数据库的字符集默认为Latin1，由于其不支持中文字符，故若建表语句中包含中文注释，会出现乱码现象。如需解决乱码问题，须做以下修改。

修改Hive元数据库中存储注释的字段的字符集为utf-8

[maxwell@hadoop102 conf]$ mysql -uroot -pcentos123
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \\g.
Your MySQL connection id is 348
Server version: 5.7.16-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\\h' for help. Type '\\c' to clear the current input statement.

mysql> use metastore
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql>

（1）字段注释

mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

（2）表注释

mysql> alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

(3）退出mysql

mysql> quit;
Bye
[maxwell@hadoop102 conf]$

3.2 启动Hive客户端

[maxwell@hadoop102 hive]$ ls -ltr
total 52
-rwxr-xr-x. 1 maxwell maxwell   230 Aug 23  2019 NOTICE
-rwxr-xr-x. 1 maxwell maxwell 20798 Aug 23  2019 LICENSE
-rwxr-xr-x. 1 maxwell maxwell  2469 Aug 23  2019 RELEASE_NOTES.txt
drwxrwxr-x. 4 maxwell maxwell    34 Mar 29 10:05 examples
drwxrwxr-x. 3 maxwell maxwell   157 Mar 29 10:05 bin
drwxrwxr-x. 4 maxwell maxwell    35 Mar 29 10:05 scripts
drwxrwxr-x. 7 maxwell maxwell    68 Mar 29 10:05 hcatalog
drwxrwxr-x. 2 maxwell maxwell    44 Mar 29 10:05 jdbc
drwxrwxr-x. 4 maxwell maxwell 12288 Mar 29 10:15 lib
drwxrwxr-x. 2 maxwell maxwell  4096 Mar 29 10:18 conf
[maxwell@hadoop102 hive]$ pwd
/opt/module/hive
[maxwell@hadoop102 hive]$ bin/hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop/bin:/opt/module/hadoop/sbin:/opt/module/kafka/bin:/home/maxwell/.local/bin:/home/maxwell/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop/bin:/opt/module/hadoop/sbin:/opt/module/kafka/bin:/opt/module/hive/bin)
Hive Session ID = 22be5461-43f7-4ce9-87dc-0adb2a4dcd41

Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = a52aa9bb-2b0f-4c22-9760-107a1fd26b7a
hive (default)> show databases;
OK
database_name
default
Time taken: 0.92 seconds, Fetched: 1 row(s)
hive (default)>

大数据技术架构(组件)——Hive：环境准备1

1.0.1、环境准备

1.0.1.0、maven安装

1.0.1.0.1、下载软件包

1.0.1.0.2、配置环境变量

1.0.1.0.3、调整maven仓库

打开$MAVEN_HOME/conf/settings.xml文件，调整maven仓库地址以及镜像地址

<settings xmIns="http://maven.apache.org/SETTINGS/1.0.0"
        xmIns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
                            https://maven.apache.org/xsd/settings-1.0.0.xsd">
    <localRepository>你自己的路径</localRepository>
    <mirrors>
        <mirror>
            <id>alimaven</id>
            <name>aliyun maven</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <mirrorOf>central</mirrorof>
        </mirror>
    </mirrors>
</settings>

1.0.1.1、cywin安装

该软件安装主要是为了支持windows编译源码涉及到的基础环境包

1.0.1.1.1、下载软件包

1.0.1.1.2、安装相关编译包

需要安装cywin，gcc+相关的编译包

binutils
gcc
gcc-mingw
gdb

1.0.1.2、源码包下载

https://downloads.apache.org/hive/hive-2.3.9/

1.0.1.3、JDK安装

1.0.1.3.1、下载软件包

1.0.1.3.2、配置环境变量

1.0.1.3.2.1、创建JAVA_HOME系统变量

1.0.1.3.2.2、创建CLASSPATH系统变量

1.0.1.3.2.3、追加到Path变量中

1.0.1.4、Hadoop安装

1.0.1.4.1、下载软件包

1.0.1.4.2、配置环境变量

创建HADOOP_HOME系统变量，然后将该变量追加到Path全局变量中

1.0.1.4.3、编辑配置文件

注意：下载的软件包中可能不包含下列的文件，但提供了template文件，直接重命名即可。由于该篇是属于学习自用，所以只需要配置基础核心信息即可。下列文件位于$HADOOP_HOME/etc/hadoop目录下。

1.0.1.4.3.1、修改core-site.xml文件

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>  
</configuration>

1.0.1.4.3.2、修改yarn-site.xml文件

<configuration>
<!-- Site specific YARN configuration properties -->
   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

1.0.1.4.3.3、修改mapred-site.xml文件

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

1.0.1.4.4、初始化启动

# 首先初始化namenode,打开cmd执行下面的命令

$HADOOP_HOME/bin> hadoop namenode -format

# 当初始化完成后，启动hadoop

$HADOOP_HOME/sbin> start-all.cmd

当以下进程都启动成功后，hadoop基本环境也算是搭建成功了！

以上是关于大数据数据仓库环境准备的主要内容，如果未能解决你的问题，请参考以下文章

大数据开发|Hive数据仓库环境构建

数据仓库相关术语对你运用Hive或者ETL有非常大的理解

Hadoop 大数据Hive仓库组件部署超详细

大数据系列之数据仓库Hive原理

大数据7|Hive数据仓库

大数据数据仓库-基于大数据体系构建数据仓库（Hive，Flume，Kafka，Azkaban，Oozie，SparkSQL）