尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册

Posted 2023-04-01 延锋L

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册相关的知识，希望对你有一定的参考价值。

视频地址：尚硅谷大数据Hadoop教程（Hadoop 3.x安装搭建到集群调优）

尚硅谷大数据技术Hadoop教程-笔记01【大数据概论】
尚硅谷大数据技术Hadoop教程-笔记02【Hadoop-入门】
尚硅谷大数据技术Hadoop教程-笔记03【Hadoop-HDFS】
尚硅谷大数据技术Hadoop教程-笔记04【Hadoop-MapReduce】
尚硅谷大数据技术Hadoop教程-笔记05【Hadoop-Yarn】
尚硅谷大数据技术Hadoop教程-笔记06【Hadoop-生产调优手册】
尚硅谷大数据技术Hadoop教程-笔记07【Hadoop-源码解析】

06_尚硅谷大数据技术之Hadoop（生产调优手册）V3.3

P143【143_尚硅谷_Hadoop_生产调优手册_核心参数_NN内存配置】14:15

P144【144_尚硅谷_Hadoop_生产调优手册_核心参数_NN心跳并发配置】03:12

P145【145_尚硅谷_Hadoop_生产调优手册_核心参数_开启回收站】07:16

P146【146_尚硅谷_Hadoop_生产调优手册_HDFS压测环境准备】05:55

P147【147_尚硅谷_Hadoop_生产调优手册_HDFS读写压测】18:55

P148【148_尚硅谷_Hadoop_生产调优手册_NN多目录配置】08:25

P149【149_尚硅谷_Hadoop_生产调优手册_DN多目录及磁盘间数据均衡】08:42

P150【150_尚硅谷_Hadoop_生产调优手册_添加白名单】10:02

P151【151_尚硅谷_Hadoop_生产调优手册_服役新服务器】13:07

P152【152_尚硅谷_Hadoop_生产调优手册_服务器间数据均衡】03:16

P153【153_尚硅谷_Hadoop_生产调优手册_黑名单退役服务器】07:46

P154【154_尚硅谷_Hadoop_生产调优手册_存储优化_5台服务器准备】11:21

P155【155_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码原理】08:16

P156【156_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码案例】10:42

P157【157_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储概述】08:36

P158【158_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储案例实操】17:40

P159【159_尚硅谷_Hadoop_生产调优手册_NameNode故障处理】09:09

P160【160_尚硅谷_Hadoop_生产调优手册_集群安全模式&磁盘修复】18:32

P161【161_尚硅谷_Hadoop_生产调优手册_慢磁盘监控】09:19

P162【162_尚硅谷_Hadoop_生产调优手册_小文件归档】08:11

P163【163_尚硅谷_Hadoop_生产调优手册_集群数据迁移】03:18

P164【164_尚硅谷_Hadoop_生产调优手册_MR跑的慢的原因】02:43

P165【165_尚硅谷_Hadoop_生产调优手册_MR常用调优参数】12:27

P166【166_尚硅谷_Hadoop_生产调优手册_MR数据倾斜问题】05:26

P167【167_尚硅谷_Hadoop_生产调优手册_Yarn生产经验】01:18

P168【168_尚硅谷_Hadoop_生产调优手册_HDFS小文件优化方法】10:15

P169【169_尚硅谷_Hadoop_生产调优手册_MapReduce集群压测】02:54

P170【170_尚硅谷_Hadoop_生产调优手册_企业开发场景案例】15:00

06_尚硅谷大数据技术之Hadoop（生产调优手册）V3.3

P143【143_尚硅谷_Hadoop_生产调优手册_核心参数_NN内存配置】14:15

第1章 HDFS—核心参数

连接成功
Last login: Wed Mar 29 10:21:43 2023 from 192.168.88.1
[root@node1 ~]# cd ../../
[root@node1 /]# cd /opt/module/hadoop-3.1.3/
[root@node1 hadoop-3.1.3]# cd /home/atguigu/
[root@node1 atguigu]# su atguigu
[atguigu@node1 ~]$ bin/myhadoop.sh start
 =================== 启动 hadoop集群 ===================
 --------------- 启动 hdfs ---------------
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node3]
 --------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
 --------------- 启动 historyserver ---------------
[atguigu@node1 ~]$ jpsall
bash: jpsall: 未找到命令...
[atguigu@node1 ~]$ bin/jpsall
=============== node1 ===============
28416 NameNode
29426 NodeManager
29797 JobHistoryServer
28589 DataNode
36542 Jps
=============== node2 ===============
19441 DataNode
20097 ResourceManager
20263 NodeManager
27227 Jps
=============== node3 ===============
19920 NodeManager
26738 Jps
19499 SecondaryNameNode
19197 DataNode
[atguigu@node1 ~]$ jmap -heap 28416
Attaching to process ID 28416, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1031798784 (984.0MB)
   NewSize                  = 21495808 (20.5MB)
   MaxNewSize               = 343932928 (328.0MB)
   OldSize                  = 43515904 (41.5MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 153616384 (146.5MB)
   used     = 20087912 (19.157325744628906MB)
   free     = 133528472 (127.3426742553711MB)
   13.076672863227923% used
From Space:
   capacity = 11534336 (11.0MB)
   used     = 7559336 (7.209144592285156MB)
   free     = 3975000 (3.7908554077148438MB)
   65.53767811168323% used
To Space:
   capacity = 16777216 (16.0MB)
   used     = 0 (0.0MB)
   free     = 16777216 (16.0MB)
   0.0% used
PS Old Generation
   capacity = 68681728 (65.5MB)
   used     = 29762328 (28.383567810058594MB)
   free     = 38919400 (37.116432189941406MB)
   43.33369131306655% used

16190 interned Strings occupying 1557424 bytes.
[atguigu@node1 ~]$ jmap -heap 19441
Attaching to process ID 19441, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
        at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
        at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
        at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
        at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
        at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jmap.JMap.runTool(JMap.java:201)
        at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:269)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)

[atguigu@node1 ~]$ jmap -heap 28589
Attaching to process ID 28589, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1031798784 (984.0MB)
   NewSize                  = 21495808 (20.5MB)
   MaxNewSize               = 343932928 (328.0MB)
   OldSize                  = 43515904 (41.5MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 102236160 (97.5MB)
   used     = 89542296 (85.3941879272461MB)
   free     = 12693864 (12.105812072753906MB)
   87.58378248948317% used
From Space:
   capacity = 7864320 (7.5MB)
   used     = 7855504 (7.4915924072265625MB)
   free     = 8816 (0.0084075927734375MB)
   99.88789876302083% used
To Space:
   capacity = 9437184 (9.0MB)
   used     = 0 (0.0MB)
   free     = 9437184 (9.0MB)
   0.0% used
PS Old Generation
   capacity = 35651584 (34.0MB)
   used     = 8496128 (8.1025390625MB)
   free     = 27155456 (25.8974609375MB)
   23.830997242647058% used

15014 interned Strings occupying 1322504 bytes.
[atguigu@node1 ~]$ cd /opt/module/hadoop-3.1.3/etc/hadoop
[atguigu@node1 hadoop]$ xsync hadoop-env.sh
==================== node1 ====================
sending incremental file list

sent 62 bytes  received 12 bytes  148.00 bytes/sec
total size is 16,052  speedup is 216.92
==================== node2 ====================
sending incremental file list
hadoop-env.sh

sent 849 bytes  received 173 bytes  2,044.00 bytes/sec
total size is 16,052  speedup is 15.71
==================== node3 ====================
sending incremental file list
hadoop-env.sh

sent 849 bytes  received 173 bytes  2,044.00 bytes/sec
total size is 16,052  speedup is 15.71
[atguigu@node1 hadoop]$ /home/atguigu/bin/myhadoop.sh stop
 =================== 关闭 hadoop集群 ===================
 --------------- 关闭 historyserver ---------------
 --------------- 关闭 yarn ---------------
Stopping nodemanagers
Stopping resourcemanager
 --------------- 关闭 hdfs ---------------
Stopping namenodes on [node1]
Stopping datanodes
Stopping secondary namenodes [node3]
[atguigu@node1 hadoop]$ /home/atguigu/bin/myhadoop.sh start
 =================== 启动 hadoop集群 ===================
 --------------- 启动 hdfs ---------------
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node3]
 --------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
 --------------- 启动 historyserver ---------------
[atguigu@node1 hadoop]$ jpsall
bash: jpsall: 未找到命令...
[atguigu@node1 hadoop]$ /home/atguigu/bin/jpsall
=============== node1 ===============
53157 DataNode
54517 Jps
53815 NodeManager
54087 JobHistoryServer
52959 NameNode
=============== node2 ===============
43362 NodeManager
43194 ResourceManager
42764 DataNode
44141 Jps
=============== node3 ===============
42120 DataNode
42619 NodeManager
42285 SecondaryNameNode
43229 Jps
[atguigu@node1 hadoop]$ jmap -heap 52959
Attaching to process ID 52959, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1073741824 (1024.0MB)
   NewSize                  = 21495808 (20.5MB)
   MaxNewSize               = 357564416 (341.0MB)
   OldSize                  = 43515904 (41.5MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 150470656 (143.5MB)
   used     = 94290264 (89.92220306396484MB)
   free     = 56180392 (53.577796936035156MB)
   62.66355614213578% used
From Space:
   capacity = 17825792 (17.0MB)
   used     = 0 (0.0MB)
   free     = 17825792 (17.0MB)
   0.0% used
To Space:
   capacity = 16777216 (16.0MB)
   used     = 0 (0.0MB)
   free     = 16777216 (16.0MB)
   0.0% used
PS Old Generation
   capacity = 66584576 (63.5MB)
   used     = 30315816 (28.911415100097656MB)
   free     = 36268760 (34.588584899902344MB)
   45.529787559208906% used

15017 interned Strings occupying 1471800 bytes.
[atguigu@node1 hadoop]$ jmap -heap 53815
Attaching to process ID 53815, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07

using thread-local object allocation.
Parallel GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 0
   MaxHeapFreeRatio         = 100
   MaxHeapSize              = 1031798784 (984.0MB)
   NewSize                  = 21495808 (20.5MB)
   MaxNewSize               = 343932928 (328.0MB)
   OldSize                  = 43515904 (41.5MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 127926272 (122.0MB)
   used     = 40883224 (38.989280700683594MB)
   free     = 87043048 (83.0107192993164MB)
   31.95842680383901% used
From Space:
   capacity = 8388608 (8.0MB)
   used     = 8371472 (7.9836578369140625MB)
   free     = 17136 (0.0163421630859375MB)
   99.79572296142578% used
To Space:
   capacity = 10485760 (10.0MB)
   used     = 0 (0.0MB)
   free     = 10485760 (10.0MB)
   0.0% used
PS Old Generation
   capacity = 37748736 (36.0MB)
   used     = 11004056 (10.494285583496094MB)
   free     = 26744680 (25.505714416503906MB)
   29.15079328748915% used

14761 interned Strings occupying 1313128 bytes.
[atguigu@node1 hadoop]$

P144【144_尚硅谷_Hadoop_生产调优手册_核心参数_NN心跳并发配置】03:12

Linux系统-让用户自定义脚本在任意地方都可执行-配置方法

1.2 NameNode心跳并发配置

NameNode有一个工作线程池，用来处理不同DataNode的并发心跳以及客户端并发的元数据操作。

对于大集群或者有大量客户端的集群来说，通常需要增大该参数。默认值是10。

<property>

<name>dfs.namenode.handler.count</name>

<value>21</value>

</property>

P145【145_尚硅谷_Hadoop_生产调优手册_核心参数_开启回收站】07:16

1.3 开启回收站配置

启用回收站，修改core-site.xml，配置垃圾回收时间为1分钟。

<property>

<name>fs.trash.interval</name>

<value>1</value>

</property>

P146【146_尚硅谷_Hadoop_生产调优手册_HDFS压测环境准备】05:55

第2章 HDFS—集群压测

cd /opt/module/software/，python -m SimpleHTTPServer，允许外部通过“主机名称+端口号”的方式下载文件。

P147【147_尚硅谷_Hadoop_生产调优手册_HDFS读写压测】18:55

2.1 测试HDFS写性能

2.2 测试HDFS读性能

P148【148_尚硅谷_Hadoop_生产调优手册_NN多目录配置】08:25

第3章 HDFS—多目录

3.1 NameNode多目录配置

P149【149_尚硅谷_Hadoop_生产调优手册_DN多目录及磁盘间数据均衡】08:42

3.2 DataNode多目录配置

3.3 集群数据均衡之磁盘间数据均衡

生产环境，由于硬盘空间不足，往往需要增加一块硬盘。刚加载的硬盘没有数据时，可以执行磁盘数据均衡命令。（Hadoop3.x新特性）

（1）生成均衡计划（我们只有一块磁盘，不会生成计划）

hdfs diskbalancer -plan hadoop103

（2）执行均衡计划

hdfs diskbalancer -execute hadoop103.plan.json

（3）查看当前均衡任务的执行情况

hdfs diskbalancer -query hadoop103

（4）取消均衡任务

hdfs diskbalancer -cancel hadoop103.plan.json

P150【150_尚硅谷_Hadoop_生产调优手册_添加白名单】10:02

第4章 HDFS—集群扩容及缩容

4.1 添加白名单

白名单：表示在白名单中配置的主机IP地址可以用来存储数据。

企业中：配置白名单，可以尽量防止黑客恶意访问攻击。

P151【151_尚硅谷_Hadoop_生产调优手册_服役新服务器】13:07

4.2 服役新服务器

需求：随着公司业务的增长，数据量越来越大，原有的数据节点的容量已经不能满足存储数据的需求，需要在原有集群基础上动态添加新的数据节点。

P152【152_尚硅谷_Hadoop_生产调优手册_服务器间数据均衡】03:16

4.3 服务器间数据均衡

P153【153_尚硅谷_Hadoop_生产调优手册_黑名单退役服务器】07:46

4.4 黑名单退役服务器

黑名单：表示在黑名单的主机IP地址不可以，用来存储数据。

企业中：配置黑名单，用来退役服务器。

P154【154_尚硅谷_Hadoop_生产调优手册_存储优化_5台服务器准备】11:21

克隆虚拟机，删除hadoop-3.1.3目录下的data与logs，修改xsync、myhadoop.sh、jpsall脚本。

P155【155_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码原理】08:16

第5章 HDFS—存储优化

注：演示纠删码和异构存储需要一共5台虚拟机。尽量拿另外一套集群。提前准备5台服务器的集群。

5.1 纠删码

5.1.1 纠删码原理

P156【156_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码案例】10:42

5.1.2 纠删码案例实操

纠删码策略是给具体一个路径设置。所有往此路径下存储的文件，都会执行此策略。

默认只开启对RS-6-3-1024k策略的支持，如要使用别的策略需要提前启用。

P157【157_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储概述】08:36

5.2 异构存储（冷热数据分离）

异构存储主要解决，不同的数据，存储在不同类型的硬盘中，达到最佳性能的问题。

5.2.1 异构存储Shell操作

（1）查看当前有哪些存储策略可以用

[atguigu@hadoop102 hadoop-3.1.3]$ hdfs storagepolicies -listPolicies

（2）为指定路径（数据存储目录）设置指定的存储策略

hdfs storagepolicies -setStoragePolicy -path xxx -policy xxx

（3）获取指定路径（数据存储目录或文件）的存储策略

hdfs storagepolicies -getStoragePolicy -path xxx

（4）取消存储策略；执行改命令之后该目录或者文件，以其上级的目录为准，如果是根目录，那么就是HOT

hdfs storagepolicies -unsetStoragePolicy -path xxx

（5）查看文件块的分布

bin/hdfs fsck xxx -files -blocks -locations

（6）查看集群节点

hadoop dfsadmin -report

P158【158_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储案例实操】17:40

5.2.2 测试环境准备

1）测试环境描述

服务器规模：5台

集群配置：副本数为2，创建好带有存储类型的目录（提前创建）。

P159【159_尚硅谷_Hadoop_生产调优手册_NameNode故障处理】09:09

第6章 HDFS—故障排除

注意：采用三台服务器即可，恢复到Yarn开始的服务器快照。

拷贝SecondaryNameNode中数据到原NameNode存储数据目录。namenode挂掉了，就将secondNamenode的数据拷贝过去。

P160【160_尚硅谷_Hadoop_生产调优手册_集群安全模式&磁盘修复】18:32

6.2 集群安全模式&磁盘修复

1）安全模式：文件系统只接受读数据请求，而不接受删除、修改等变更请求

2）进入安全模式场景

NameNode在加载镜像文件和编辑日志期间处于安全模式；
NameNode再接收DataNode注册时，处于安全模式

P161【161_尚硅谷_Hadoop_生产调优手册_慢磁盘监控】09:19

6.3 慢磁盘监控

“慢磁盘”指的时写入数据非常慢的一类磁盘。其实慢性磁盘并不少见，当机器运行时间长了，上面跑的任务多了，磁盘的读写性能自然会退化，严重时就会出现写入数据延时的问题。

如何发现慢磁盘？

正常在HDFS上创建一个目录，只需要不到1s的时间。如果你发现创建目录超过1分钟及以上，而且这个现象并不是每次都有。只是偶尔慢了一下，就很有可能存在慢磁盘。

可以采用如下方法找出是哪块磁盘慢：

1）通过心跳未联系时间。

2）fio命令，测试磁盘的读写性能。

P162【162_尚硅谷_Hadoop_生产调优手册_小文件归档】08:11

6.4 小文件归档

1）HDFS存储小文件弊端

每个文件均按块存储，每个块的元数据存储在NameNode的内存中，因此HDFS存储小文件会非常低效。因为大量的小文件会耗尽NameNode中的大部分内存。但注意，存储小文件所需要的磁盘容量和数据块的大小无关。例如，一个1MB的文件设置为128MB的块存储，实际使用的是1MB的磁盘空间，而不是128MB。

2）解决存储小文件办法之一

HDFS存档文件或HAR文件，是一个更高效的文件存档工具，它将文件存入HDFS块，在减少NameNode内存使用的同时，允许对文件进行透明的访问。具体说来，HDFS存档文件对内还是一个一个独立文件，对NameNode而言却是一个整体，减少了NameNode的内存。

P163【163_尚硅谷_Hadoop_生产调优手册_集群数据迁移】03:18

第7章 HDFS—集群迁移

7.1 Apache和Apache集群间数据拷贝

7.2 Apache和CDH集群间数据拷贝

P164【164_尚硅谷_Hadoop_生产调优手册_MR跑的慢的原因】02:43

第8章 MapReduce生产经验

8.1 MapReduce跑的慢的原因

MapReduce程序效率的瓶颈在于两点：

1）计算机性能

        CPU、内存、磁盘、网络

2）I/O操作优化

        （1）数据倾斜

        （2）Map运行时间太长，导致Reduce等待过久

        （3）小文件过多

P165【165_尚硅谷_Hadoop_生产调优手册_MR常用调优参数】12:27

8.2 MapReduce常用调优参数

P166【166_尚硅谷_Hadoop_生产调优手册_MR数据倾斜问题】05:26

8.3 MapReduce数据倾斜问题

P167【167_尚硅谷_Hadoop_生产调优手册_Yarn生产经验】01:18

第9章 Hadoop-Yarn生产经验

9.1 常用的调优参数

9.2 容量调度器使用

9.3 公平调度器使用

P168【168_尚硅谷_Hadoop_生产调优手册_HDFS小文件优化方法】10:15

第10章 Hadoop综合调优

10.1 Hadoop小文件优化方法

10.1.1 Hadoop小文件弊端

10.1.2 Hadoop小文件解决方案

4）开启uber模式，实现JVM重用（计算方向）

默认情况下，每个Task任务都需要启动一个JVM来运行，如果Task任务计算的数据量很小，我们可以让同一个Job的多个Task运行在一个JVM中，不必为每个Task都开启一个JVM。

P169【169_尚硅谷_Hadoop_生产调优手册_MapReduce集群压测】02:54

10.2 测试MapReduce计算性能

P170【170_尚硅谷_Hadoop_生产调优手册_企业开发场景案例】15:00

10.3 企业开发场景案例

10.3.1 需求

10.3.2 HDFS参数调优

10.3.3 MapReduce参数调优

10.3.4 Yarn参数调优

10.3.5 执行程序

尚硅谷大数据Hadoop教程-笔记01入门

视频地址：尚硅谷大数据Hadoop教程（Hadoop 3.x安装搭建到集群调优）

尚硅谷大数据Hadoop教程-笔记01【入门】
尚硅谷大数据Hadoop教程-笔记02【HDFS】
尚硅谷大数据Hadoop教程-笔记03【MapReduce】
尚硅谷大数据Hadoop教程-笔记04【Yarn】
尚硅谷大数据Hadoop教程-笔记04【生产调优手册】
尚硅谷大数据Hadoop教程-笔记04【源码解析】

00_尚硅谷大数据Hadoop课程整体介绍

P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38

01_尚硅谷大数据技术之大数据概论

P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34

P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23

P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58

P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17

P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25

02_尚硅谷大数据技术之Hadoop（入门）V3.3

P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29

P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00

P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52

P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59

P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52

P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00

P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26

P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35

P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55

P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22

P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17

P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41

P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56

P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50

P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05

P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25

P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01

P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02

P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20

P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56

P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01

P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14

P029【029_尚硅谷_Hadoop_入门_ssh免密登录】11:25

P030【030_尚硅谷_Hadoop_入门_集群配置】13:24

P031【031_尚硅谷_Hadoop_入门_群起集群并测试】16:52

P032【032_尚硅谷_Hadoop_入门_集群崩溃处理办法】08:10

P033【033_尚硅谷_Hadoop_入门_历史服务器配置】05:26

P034【034_尚硅谷_Hadoop_入门_日志聚集功能配置】05:42

P035【035_尚硅谷_Hadoop_入门_两个常用脚本】09:18

P036【036_尚硅谷_Hadoop_入门_两道面试题】04:15

P037【037_尚硅谷_Hadoop_入门_集群时间同步】11:27

P038【038_尚硅谷_Hadoop_入门_常见问题总结】10:57

00_尚硅谷大数据Hadoop课程整体介绍

P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38

Hadoop3.x从入门到精通

一、课程升级的重点内容
   1、yarn
   2、生产调优手册
   3、源码
二、课程特色
   1、新 hadoop3.1.3
   2、细从搭建集群开始每一个配置每一行代码都有注释。出书
   3、真 20+的企业案例 30+企业调优从百万代码中阅读源码
   4、全全套资料
三、资料获取方式
   1、关注尚硅谷教育公众号：回复大数据
   2、谷粒学院
   3、b站
四、技术基础要求
   Javase，maven + idea + linux常用命令

01_尚硅谷大数据技术之大数据概论

P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34

第1章，大数据概念：大数据（Big Data）：指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合，是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。

大数据主要解决，海量数据的采集、存储和分析计算问题。

P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23

第2章，大数据特点（4V）

Volume（大量）
Velocity（高速）
Variety（多样）
Value（低价值密度）

P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58

第3章，大数据应用场景

抖音：推荐的都是你喜欢的视频。
电商站内广告推荐：给用户推荐可能喜欢的商品。
零售：分析用户消费习惯，为用户购买商品提供方便，从而提升商品销量。
物流仓储：京东物流，上午下单下午送达、下午下单次日上午送达。
保险：海量数据挖掘及风险预测，助力保险行业精准营销，提升精细化定价能力。
金融：多维度体现用户特征，帮助金融机构推荐优质客户，防范欺诈风险。
房产：大数据全面助力房地产行业，打造精准投策与营销，选出更合适的地，建造更合适的楼，卖给更合适的人。
人工智能 + 5G + 物联网 + 虚拟与现实。

P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17

第4章，好！

P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25

第5章，大数据部门间业务流程分析

第6章，大数据部门内组织结构

02_尚硅谷大数据技术之Hadoop（入门）V3.3

P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29

P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00

P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52

P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59

Hadoop三大发行版本：Apache、Cloudera、Hortonworks。

1）Apache Hadoop

官网地址：http://hadoop.apache.org

下载地址：https://hadoop.apache.org/releases.html

2）Cloudera Hadoop

官网地址：https://www.cloudera.com/downloads/cdh

下载地址：https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_download.html

（1）2008年成立的Cloudera是最早将Hadoop商用的公司，为合作伙伴提供Hadoop的商用解决方案，主要是包括支持、咨询服务、培训。

（2）2009年Hadoop的创始人Doug Cutting也加盟Cloudera公司。Cloudera产品主要为CDH，Cloudera Manager，Cloudera Support

（3）CDH是Cloudera的Hadoop发行版，完全开源，比Apache Hadoop在兼容性，安全性，稳定性上有所增强。Cloudera的标价为每年每个节点10000美元。

（4）Cloudera Manager是集群的软件分发及管理监控平台，可以在几个小时内部署好一个Hadoop集群，并对集群的节点及服务进行实时监控。

3）Hortonworks Hadoop

官网地址：https://hortonworks.com/products/data-center/hdp/

下载地址：https://hortonworks.com/downloads/#data-platform

（1）2011年成立的Hortonworks是雅虎与硅谷风投公司Benchmark Capital合资组建。

（2）公司成立之初就吸纳了大约25名至30名专门研究Hadoop的雅虎工程师，上述工程师均在2005年开始协助雅虎开发Hadoop，贡献了Hadoop80%的代码。

（3）Hortonworks的主打产品是Hortonworks Data Platform（HDP），也同样是100%开源的产品，HDP除常见的项目外还包括了Ambari，一款开源的安装和管理系统。

（4）2018年Hortonworks目前已经被Cloudera公司收购。

P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52

Hadoop优势（4高）

高可靠性
高拓展性
高效性
高容错性

P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00

P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26

Hadoop Distributed File System，简称 HDFS，是一个分布式文件系统。

1）NameNode（nn）：存储文件的元数据，如文件名，文件目录结构，文件属性（生成时间、副本数、文件权限），以及每个文件的块列表和块所在的DataNode等。
2）DataNode(dn)：在本地文件系统存储文件块数据，以及块数据的校验和。
3）Secondary NameNode(2nn)：每隔一段时间对NameNode元数据备份。

P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35

Yet Another Resource Negotiator 简称 YARN ，另一种资源协调者，是 Hadoop 的资源管理器。

P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55

MapReduce 将计算过程分为两个阶段：Map 和 Reduce

1）Map 阶段并行处理输入数据
2）Reduce 阶段对 Map 结果进行汇总

P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22

HDFS
NameNode：负责数据存储。
DataNode：数据存储在哪个节点上。
SecondaryNameNode：秘书，备份NameNode数据恢复NameNode部分工作。
YARN：整个集群的资源管理。
ResourceManager：资源管理，map阶段。
NodeManager
MapReduce

P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17

大数据技术生态体系

推荐系统项目框架

P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41

P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56

P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50

[root@hadoop100 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.88.133  netmask 255.255.255.0  broadcast 192.168.88.255
        inet6 fe80::363b:8659:c323:345d  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:0f:0a:6d  txqueuelen 1000  (Ethernet)
        RX packets 684561  bytes 1003221355 (956.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53538  bytes 3445292 (3.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 84  bytes 9492 (9.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 84  bytes 9492 (9.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:1c:3c:a9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hadoop100 ~]# systemctl restart network
[root@hadoop100 ~]# cat /etc/host
cat: /etc/host: 没有那个文件或目录
[root@hadoop100 ~]# cat /etc/hostname
hadoop100
[root@hadoop100 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hadoop100 ~]# vim /etc/hosts
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.88.100  netmask 255.255.255.0  broadcast 192.168.88.255
        inet6 fe80::363b:8659:c323:345d  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:0f:0a:6d  txqueuelen 1000  (Ethernet)
        RX packets 684830  bytes 1003244575 (956.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 53597  bytes 3452600 (3.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 132  bytes 14436 (14.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 132  bytes 14436 (14.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:1c:3c:a9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@hadoop100 ~]# ll
总用量 40
-rw-------. 1 root root 1973 3月  14 10:19 anaconda-ks.cfg
-rw-r--r--. 1 root root 2021 3月  14 10:26 initial-setup-ks.cfg
drwxr-xr-x. 2 root root 4096 3月  14 10:27 公共
drwxr-xr-x. 2 root root 4096 3月  14 10:27 模板
drwxr-xr-x. 2 root root 4096 3月  14 10:27 视频
drwxr-xr-x. 2 root root 4096 3月  14 10:27 图片
drwxr-xr-x. 2 root root 4096 3月  14 10:27 文档
drwxr-xr-x. 2 root root 4096 3月  14 10:27 下载
drwxr-xr-x. 2 root root 4096 3月  14 10:27 音乐
drwxr-xr-x. 2 root root 4096 3月  14 10:27 桌面
[root@hadoop100 ~]#

vim /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="3241b48d-3234-4c23-8a03-b9b393a99a65"
DEVICE="ens33"
ONBOOT="yes"

IPADDR=192.168.88.100
GATEWAY=192.168.88.2
DNS1=192.168.88.2

vim /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.88.100 hadoop100
192.168.88.101 hadoop101
192.168.88.102 hadoop102
192.168.88.103 hadoop103
192.168.88.104 hadoop104
192.168.88.105 hadoop105
192.168.88.106 hadoop106
192.168.88.107 hadoop107
192.168.88.108 hadoop108

192.168.88.151 node1 node1.itcast.cn
192.168.88.152 node2 node2.itcast.cn
192.168.88.153 node3 node3.itcast.cn

P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05

P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25

yum install -y epel-release

systemctl stop firewalld

systemctl disable firewalld.service

P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01

vim /etc/sysconfig/network-scripts/ifcfg-ens33

vim /etc/hostname

reboot

P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02

在hadoop102上安装jdk，然后将jdk拷贝到hadoop103与hadoop104上。

P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20

同P024图！

P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56

Apache Hadoop

http://node1:9870/explorer.html#/

[root@node1 ~]# cd /export/server/hadoop-3.3.0/share/hadoop/mapreduce/
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wordcount/input /wordcount/output
2023-03-20 14:43:07,516 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 14:43:09,291 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0001
2023-03-20 14:43:11,916 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 14:43:12,313 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0001
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 14:43:14,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 14:43:14,684 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 14:43:17,054 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0001
2023-03-20 14:43:17,123 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0001/
2023-03-20 14:43:17,124 INFO mapreduce.Job: Running job: job_1679293699463_0001
2023-03-20 14:43:52,340 INFO mapreduce.Job: Job job_1679293699463_0001 running in uber mode : false
2023-03-20 14:43:52,360 INFO mapreduce.Job:  map 0% reduce 0%
2023-03-20 14:44:08,011 INFO mapreduce.Job:  map 100% reduce 0%
2023-03-20 14:44:16,986 INFO mapreduce.Job:  map 100% reduce 100%
2023-03-20 14:44:18,020 INFO mapreduce.Job: Job job_1679293699463_0001 completed successfully
2023-03-20 14:44:18,579 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=31
                FILE: Number of bytes written=529345
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=142
                HDFS: Number of bytes written=17
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=11303
                Total time spent by all reduces in occupied slots (ms)=6220
                Total time spent by all map tasks (ms)=11303
                Total time spent by all reduce tasks (ms)=6220
                Total vcore-milliseconds taken by all map tasks=11303
                Total vcore-milliseconds taken by all reduce tasks=6220
                Total megabyte-milliseconds taken by all map tasks=11574272
                Total megabyte-milliseconds taken by all reduce tasks=6369280
        Map-Reduce Framework
                Map input records=2
                Map output records=5
                Map output bytes=53
                Map output materialized bytes=31
                Input split bytes=108
                Combine input records=5
                Combine output records=2
                Reduce input groups=2
                Reduce shuffle bytes=31
                Reduce input records=2
                Reduce output records=2
                Spilled Records=4
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=546
                CPU time spent (ms)=3680
                Physical memory (bytes) snapshot=499236864
                Virtual memory (bytes) snapshot=5568684032
                Total committed heap usage (bytes)=365953024
                Peak Map Physical memory (bytes)=301096960
                Peak Map Virtual memory (bytes)=2779201536
                Peak Reduce Physical memory (bytes)=198139904
                Peak Reduce Virtual memory (bytes)=2789482496
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=34
        File Output Format Counters 
                Bytes Written=17
[root@node1 mapreduce]#

[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wc_input /wc_output
2023-03-20 15:01:48,007 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 15:01:49,475 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0002
2023-03-20 15:01:50,522 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 15:01:51,010 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0002
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 15:01:52,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 15:01:52,687 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 15:01:53,237 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0002
2023-03-20 15:01:53,487 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0002/
2023-03-20 15:01:53,492 INFO mapreduce.Job: Running job: job_1679293699463_0002
2023-03-20 15:02:15,329 INFO mapreduce.Job: Job job_1679293699463_0002 running in uber mode : false
2023-03-20 15:02:15,342 INFO mapreduce.Job:  map 0% reduce 0%
2023-03-20 15:02:26,652 INFO mapreduce.Job:  map 100% reduce 0%
2023-03-20 15:02:40,297 INFO mapreduce.Job:  map 100% reduce 100%
2023-03-20 15:02:41,350 INFO mapreduce.Job: Job job_1679293699463_0002 completed successfully
2023-03-20 15:02:41,557 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=60
                FILE: Number of bytes written=529375
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=149
                HDFS: Number of bytes written=38
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=8398
                Total time spent by all reduces in occupied slots (ms)=9720
                Total time spent by all map tasks (ms)=8398
                Total time spent by all reduce tasks (ms)=9720
                Total vcore-milliseconds taken by all map tasks=8398
                Total vcore-milliseconds taken by all reduce tasks=9720
                Total megabyte-milliseconds taken by all map tasks=8599552
                Total megabyte-milliseconds taken by all reduce tasks=9953280
        Map-Reduce Framework
                Map input records=4
                Map output records=6
                Map output bytes=69
                Map output materialized bytes=60
                Input split bytes=100
                Combine input records=6
                Combine output records=4
                Reduce input groups=4
                Reduce shuffle bytes=60
                Reduce input records=4
                Reduce output records=4
                Spilled Records=8
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=1000
                CPU time spent (ms)=3880
                Physical memory (bytes) snapshot=503771136
                Virtual memory (bytes) snapshot=5568987136
                Total committed heap usage (bytes)=428343296
                Peak Map Physical memory (bytes)=303013888
                Peak Map Virtual memory (bytes)=2782048256
                Peak Reduce Physical memory (bytes)=200757248
                Peak Reduce Virtual memory (bytes)=2786938880
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=49
        File Output Format Counters 
                Bytes Written=38
[root@node1 mapreduce]# pwd
/export/server/hadoop-3.3.0/share/hadoop/mapreduce
[root@node1 mapreduce]#

P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01

第一次同步用scp，后续同步用rsync。

rsync主要用于备份和镜像，具有速度快、避免复制相同内容和支持符号链接的优点。

rsync和scp区别：用rsync做文件的复制要比scp的速度快，rsync只对差异文件做更新。scp是把所有文件都复制过去。

P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14

拷贝同步命令

scp（secure copy）安全拷贝
rsync 远程同步工具
xsync 集群分发脚本

dirname命令：截取文件的路径，去除文件名中的非目录部分，仅显示与目录有关的内容。

[root@node1 ~]# dirname /home/atguigu/a.txt
/home/atguigu
[root@node1 ~]#

basename命令：获取文件名称。

[root@node1 atguigu]# basename /home/atguigu/a.txt
a.txt
[root@node1 atguigu]#

#!/bin/bash

#1. 判断参数个数
if [ $# -lt 1 ]
then
    echo Not Enough Arguement!
    exit;
fi

#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
    echo ====================  $host  ====================
    #3. 遍历所有目录，挨个发送

    for file in $@
    do
        #4. 判断文件是否存在
        if [ -e $file ]
            then
                #5. 获取父目录
                pdir=$(cd -P $(dirname $file); pwd)

                #6. 获取当前文件的名称
                fname=$(basename $file)
                ssh $host "mkdir -p $pdir"
                rsync -av $pdir/$fname $host:$pdir
            else
                echo $file does not exists!
        fi
    done
done

[root@node1 bin]# chmod 777 xsync 
[root@node1 bin]# ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[root@node1 bin]# cd ..
[root@node1 atguigu]# xsync bin/
==================== node1 ====================
sending incremental file list

sent 94 bytes  received 17 bytes  222.00 bytes/sec
total size is 727  speedup is 6.55
==================== node2 ====================
sending incremental file list
bin/
bin/xsync

sent 871 bytes  received 39 bytes  606.67 bytes/sec
total size is 727  speedup is 0.80
==================== node3 ====================
sending incremental file list
bin/
bin/xsync

sent 871 bytes  received 39 bytes  1,820.00 bytes/sec
total size is 727  speedup is 0.80
[root@node1 atguigu]# pwd
/home/atguigu
[root@node1 atguigu]# ls -al
总用量 20
drwx------  6 atguigu atguigu  168 3月  20 15:56 .
drwxr-xr-x. 6 root    root      56 3月  20 10:08 ..
-rw-r--r--  1 root    root       0 3月  20 15:44 a.txt
-rw-------  1 atguigu atguigu   21 3月  20 11:48 .bash_history
-rw-r--r--  1 atguigu atguigu   18 8月   8 2019 .bash_logout
-rw-r--r--  1 atguigu atguigu  193 8月   8 2019 .bash_profile
-rw-r--r--  1 atguigu atguigu  231 8月   8 2019 .bashrc
drwxrwxr-x  2 atguigu atguigu   19 3月  20 15:56 bin
drwxrwxr-x  3 atguigu atguigu   18 3月  20 10:17 .cache
drwxrwxr-x  3 atguigu atguigu   18 3月  20 10:17 .config
drwxr-xr-x  4 atguigu atguigu   39 3月  10 20:04 .mozilla
-rw-------  1 atguigu atguigu 1261 3月  20 15:56 .viminfo
[root@node1 atguigu]#

连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password: 
atguigu@node1's password: 
sending incremental file list

sent 98 bytes  received 17 bytes  17.69 bytes/sec
total size is 727  speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password: 
atguigu@node3's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
[atguigu@node1 ~]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:20 2023 from 192.168.88.151
[root@node2 ~]# su atguigu
[atguigu@node2 root]$ vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 root]$ su root
密码：
[root@node2 ~]# vim /etc/sudoers
[root@node2 ~]# cd /opt/
[root@node2 opt]# ll
总用量 0
drwxr-xr-x  4 atguigu atguigu 46 3月  20 11:32 module
drwxr-xr-x. 2 root    root     6 10月 31 2018 rh
drwxr-xr-x  2 atguigu atguigu 67 3月  20 10:47 software
[root@node2 opt]# su atguigu
[atguigu@node2 opt]$ cd /home/atguigu/
[atguigu@node2 ~]$ llk
bash: llk: 未找到命令
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node2 ~]$ cd ~
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ cd bin
[atguigu@node2 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[atguigu@node2 bin]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:26 2023 from 192.168.88.152
[root@node3 ~]# vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd /opt/
[root@node3 opt]# ll
总用量 0
drwxr-xr-x  4 atguigu atguigu 46 3月  20 11:32 module
drwxr-xr-x. 2 root    root     6 10月 31 2018 rh
drwxr-xr-x  2 atguigu atguigu 67 3月  20 10:47 software
[root@node3 opt]# cd ~
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd ~
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月  11 2020 anaconda-ks.cfg
-rw-------  1 root root    0 2月  23 16:20 nohup.out
[root@node3 ~]# su atguigu
[atguigu@node3 root]$ cd ~
[atguigu@node3 ~]$ ls
bin
[atguigu@node3 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月  20 15:56 bin
[atguigu@node3 ~]$ cd bin
[atguigu@node3 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月  20 16:00 xsync
[atguigu@node3 bin]$ 
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password: 
atguigu@node1's password: 
sending incremental file list

sent 98 bytes  received 17 bytes  17.69 bytes/sec
total size is 727  speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password: 
atguigu@node3's password: 
sending incremental file list

sent 94 bytes  received 17 bytes  44.40 bytes/sec
total size is 727  speedup is 6.55
[atguigu@node1 ~]$ xsync /etc/profile.d/my_env.sh
==================== node1 ====================
atguigu@node1's password: 
atguigu@node1's password: 
.sending incremental file list

sent 48 bytes  received 12 bytes  13.33 bytes/sec
total size is 223  speedup is 3.72
==================== node2 ====================
atguigu@node2's password: 
atguigu@node2's password: 
sending incremental file list
my_env.sh
rsync: mkstemp "/etc/profile.d/.my_env.sh.guTzvB" failed: Permission denied (13)

sent 95 bytes  received 126 bytes  88.40 bytes/sec
total size is 223  speedup is 1.01
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
==================== node3 =========&#以上是关于尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册的主要内容，如果未能解决你的问题，请参考以下文章