数据仓库_hadoop

Posted tunan96

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据仓库_hadoop相关的知识,希望对你有一定的参考价值。

1.安装hadoop的hdfs伪分布式部署
2.hadoop fs常规命令
3.配置文件在官方哪里找
4.整理 jdk、ssh、hosts文件

 

1.安装hadoop的hdfs伪分布式部署

1.1 创建用户和目录

[root@aliyun ~]# useradd hadoop
[root@aliyun ~]# su - hadoop
[hadoop@aliyun ~]$ mkdir app software sourcecode log tmp data lib
[hadoop@aliyun ~]$ ll
total 28
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 app    #解压的文件夹  软连接
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 data   #数据
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 lib    #第三方的jar
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 log    #日志文件夹
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 software #压缩包
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 sourcecode  #源代码编译
drwxrwxr-x 2 hadoop hadoop 4096 Nov 28 11:26 tmp    #临时文件夹

1.2下载/上传压缩包

[hadoop@aliyun ~]$ cd software/
[hadoop@aliyun software]$ wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz

1.3  解压

[hadoop@aliyun software]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C ../app/
...
...
...
[hadoop@aliyun software]$ cd ../app/
[hadoop@aliyun app]$ ln -s hadoop-2.6.0-cdh5.16.2/ hadoop
[hadoop@aliyun app]$ ll
total 4
lrwxrwxrwx  1 hadoop hadoop   23 Nov 28 11:36 hadoop -> hadoop-2.6.0-cdh5.16.2/
drwxr-xr-x 14 hadoop hadoop 4096 Jun  3 19:11 hadoop-2.6.0-cdh5.16.2

1.4环境要求

[root@aliyun java]# mkdir /usr/java
[root@aliyun java]# cd /usr/java
[root@aliyun java]# rz -E
[root@aliyun java]# tar -xzvf jdk-8u144-linux-x64.tar.gz
[root@aliyun java]# chown -R  root:root jdk1.8.0_144/
[root@aliyun java]# ln -s jdk1.8.0_144/ jdk
[root@aliyun java]# ll
total 4
lrwxrwxrwx 1 root root   13 Nov 28 12:01 jdk -> jdk1.8.0_144/
drwxr-xr-x 8 root root 4096 Jul 22  2017 jdk1.8.0_144
[root@aliyun java]# vim /etc/profile
    #env
    export JAVA_HOME=/usr/java/jdk
    export PATH=$JAVA_HOME/bin:$PATH
[root@aliyun java]# source /etc/profile
[root@aliyun java]# which java
/usr/java/jdk/bin/java

1.5 JAVA_HOME 显性配置

[hadoop@aliyun hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk
[root@aliyun java]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.16.39.48 aliyun

1.6配置文件

etc/hadoop/core-site.xml:
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://aliyun:9000</value>
    </property>
</configuration>

etc/hadoop/hdfs-site.xml:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

1.7 ssh无密码信任关系

家目录下输入
  $ ssh-keygen -t rsa -P ‘‘ -f ~/.ssh/id_rsa
  $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys
[hadoop@aliyun ~]$ ssh aliyun date
Thu Nov 28 12:15:08 CST 2019

1.8 环境变量 hadoop

[hadoop@aliyun ~]$ vi .bashrc
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
[hadoop@aliyun ~]$ source .bashrc 
[hadoop@aliyun ~]$ which hadoop
~/app/hadoop/bin/hadoop

1.9 格式化

[hadoop@aliyun ~]$ hdfs namenode -format
has been successfully formatted.

1.10 第一次启动

[hadoop@aliyun ~]$ start-dfs.sh 
[hadoop@aliyun ~]$ jps
10804 SecondaryNameNode
10536 NameNode
10907 Jps
10654 DataNode
[hadoop@aliyun ~]$ 

坑:第一次启动会输入yes确定信任关系,我们打开./ssh下的known_hosts文件,这个文件中存放信任关系

[hadoop@aliyun .ssh]$ cat known_hosts
aliyun,172.16.39.48 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM=
localhost ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM=
0.0.0.0 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCjHBKn/7LF5sfbae1OLkK5QoWm11Xn8RZs1JTc7K8v4RFum1OKIjArocvRjLOYPsq5ezYo8TlBHTrAgeUcvkBM=

将来也许在启动hadoop的时候一直要输入密码,就是这里面已经存在了主机的信任关系,但是密匙对是新的,删除这个文件或者内容即可

1.11 DN SNN都以 ruozedata001启动

  NN:core-site.xml fs.defaultFS控制
  DN: slaves文件
  2NN:hdfs-site.xml

<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>aliyun:50090</value>       #注意端口号,新旧版本有区别
</property>
<property>
    <name>dfs.namenode.secondary.https-address</name>
    <value>aliyun:50091</value>       #注意端口号,新旧版本有区别
</property>

2.hadoop fs常规命令

hadoop fs -mkdir /
hadoop fs -put
hadoop fs -get
hadoop fs -cat
hadoop fs -rm
hadoop fs -ls

3.配置文件在官方哪里找 

https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

4.整理 jdk、ssh、hosts文件

jdk和ssh是hadoop运行的先决条件

hosts文件存放主机名和ip地址的映射

以上是关于数据仓库_hadoop的主要内容,如果未能解决你的问题,请参考以下文章

Hive数据仓库(建表·分区·分桶)基础

Hadoop整理五(基于Hadoop的数据仓库Hive)

我拥有的电子书清单_2

从Hadoop框架来入门学习数据仓库概念

VIM 代码片段插件 ultisnips 使用教程

数据仓库和Hadoop大数据平台有什么差别?