尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册
Posted 延锋L
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册相关的知识,希望对你有一定的参考价值。
目录
06_尚硅谷大数据技术之Hadoop(生产调优手册)V3.3
P143【143_尚硅谷_Hadoop_生产调优手册_核心参数_NN内存配置】14:15
P144【144_尚硅谷_Hadoop_生产调优手册_核心参数_NN心跳并发配置】03:12
P145【145_尚硅谷_Hadoop_生产调优手册_核心参数_开启回收站】07:16
P146【146_尚硅谷_Hadoop_生产调优手册_HDFS压测环境准备】05:55
P147【147_尚硅谷_Hadoop_生产调优手册_HDFS读写压测】18:55
P148【148_尚硅谷_Hadoop_生产调优手册_NN多目录配置】08:25
P149【149_尚硅谷_Hadoop_生产调优手册_DN多目录及磁盘间数据均衡】08:42
P150【150_尚硅谷_Hadoop_生产调优手册_添加白名单】10:02
P151【151_尚硅谷_Hadoop_生产调优手册_服役新服务器】13:07
P152【152_尚硅谷_Hadoop_生产调优手册_服务器间数据均衡】03:16
P153【153_尚硅谷_Hadoop_生产调优手册_黑名单退役服务器】07:46
P154【154_尚硅谷_Hadoop_生产调优手册_存储优化_5台服务器准备】11:21
P155【155_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码原理】08:16
P156【156_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码案例】10:42
P157【157_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储概述】08:36
P158【158_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储案例实操】17:40
P159【159_尚硅谷_Hadoop_生产调优手册_NameNode故障处理】09:09
P160【160_尚硅谷_Hadoop_生产调优手册_集群安全模式&磁盘修复】18:32
P161【161_尚硅谷_Hadoop_生产调优手册_慢磁盘监控】09:19
P162【162_尚硅谷_Hadoop_生产调优手册_小文件归档】08:11
P163【163_尚硅谷_Hadoop_生产调优手册_集群数据迁移】03:18
P164【164_尚硅谷_Hadoop_生产调优手册_MR跑的慢的原因】02:43
P165【165_尚硅谷_Hadoop_生产调优手册_MR常用调优参数】12:27
P166【166_尚硅谷_Hadoop_生产调优手册_MR数据倾斜问题】05:26
P167【167_尚硅谷_Hadoop_生产调优手册_Yarn生产经验】01:18
P168【168_尚硅谷_Hadoop_生产调优手册_HDFS小文件优化方法】10:15
P169【169_尚硅谷_Hadoop_生产调优手册_MapReduce集群压测】02:54
P170【170_尚硅谷_Hadoop_生产调优手册_企业开发场景案例】15:00
06_尚硅谷大数据技术之Hadoop(生产调优手册)V3.3
P143【143_尚硅谷_Hadoop_生产调优手册_核心参数_NN内存配置】14:15
第1章 HDFS—核心参数
连接成功
Last login: Wed Mar 29 10:21:43 2023 from 192.168.88.1
[root@node1 ~]# cd ../../
[root@node1 /]# cd /opt/module/hadoop-3.1.3/
[root@node1 hadoop-3.1.3]# cd /home/atguigu/
[root@node1 atguigu]# su atguigu
[atguigu@node1 ~]$ bin/myhadoop.sh start
=================== 启动 hadoop集群 ===================
--------------- 启动 hdfs ---------------
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node3]
--------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
--------------- 启动 historyserver ---------------
[atguigu@node1 ~]$ jpsall
bash: jpsall: 未找到命令...
[atguigu@node1 ~]$ bin/jpsall
=============== node1 ===============
28416 NameNode
29426 NodeManager
29797 JobHistoryServer
28589 DataNode
36542 Jps
=============== node2 ===============
19441 DataNode
20097 ResourceManager
20263 NodeManager
27227 Jps
=============== node3 ===============
19920 NodeManager
26738 Jps
19499 SecondaryNameNode
19197 DataNode
[atguigu@node1 ~]$ jmap -heap 28416
Attaching to process ID 28416, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1031798784 (984.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 343932928 (328.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 153616384 (146.5MB)
used = 20087912 (19.157325744628906MB)
free = 133528472 (127.3426742553711MB)
13.076672863227923% used
From Space:
capacity = 11534336 (11.0MB)
used = 7559336 (7.209144592285156MB)
free = 3975000 (3.7908554077148438MB)
65.53767811168323% used
To Space:
capacity = 16777216 (16.0MB)
used = 0 (0.0MB)
free = 16777216 (16.0MB)
0.0% used
PS Old Generation
capacity = 68681728 (65.5MB)
used = 29762328 (28.383567810058594MB)
free = 38919400 (37.116432189941406MB)
43.33369131306655% used
16190 interned Strings occupying 1557424 bytes.
[atguigu@node1 ~]$ jmap -heap 19441
Attaching to process ID 19441, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.tools.jmap.JMap.runTool(JMap.java:201)
at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:269)
at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)
[atguigu@node1 ~]$ jmap -heap 28589
Attaching to process ID 28589, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1031798784 (984.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 343932928 (328.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 102236160 (97.5MB)
used = 89542296 (85.3941879272461MB)
free = 12693864 (12.105812072753906MB)
87.58378248948317% used
From Space:
capacity = 7864320 (7.5MB)
used = 7855504 (7.4915924072265625MB)
free = 8816 (0.0084075927734375MB)
99.88789876302083% used
To Space:
capacity = 9437184 (9.0MB)
used = 0 (0.0MB)
free = 9437184 (9.0MB)
0.0% used
PS Old Generation
capacity = 35651584 (34.0MB)
used = 8496128 (8.1025390625MB)
free = 27155456 (25.8974609375MB)
23.830997242647058% used
15014 interned Strings occupying 1322504 bytes.
[atguigu@node1 ~]$ cd /opt/module/hadoop-3.1.3/etc/hadoop
[atguigu@node1 hadoop]$ xsync hadoop-env.sh
==================== node1 ====================
sending incremental file list
sent 62 bytes received 12 bytes 148.00 bytes/sec
total size is 16,052 speedup is 216.92
==================== node2 ====================
sending incremental file list
hadoop-env.sh
sent 849 bytes received 173 bytes 2,044.00 bytes/sec
total size is 16,052 speedup is 15.71
==================== node3 ====================
sending incremental file list
hadoop-env.sh
sent 849 bytes received 173 bytes 2,044.00 bytes/sec
total size is 16,052 speedup is 15.71
[atguigu@node1 hadoop]$ /home/atguigu/bin/myhadoop.sh stop
=================== 关闭 hadoop集群 ===================
--------------- 关闭 historyserver ---------------
--------------- 关闭 yarn ---------------
Stopping nodemanagers
Stopping resourcemanager
--------------- 关闭 hdfs ---------------
Stopping namenodes on [node1]
Stopping datanodes
Stopping secondary namenodes [node3]
[atguigu@node1 hadoop]$ /home/atguigu/bin/myhadoop.sh start
=================== 启动 hadoop集群 ===================
--------------- 启动 hdfs ---------------
Starting namenodes on [node1]
Starting datanodes
Starting secondary namenodes [node3]
--------------- 启动 yarn ---------------
Starting resourcemanager
Starting nodemanagers
--------------- 启动 historyserver ---------------
[atguigu@node1 hadoop]$ jpsall
bash: jpsall: 未找到命令...
[atguigu@node1 hadoop]$ /home/atguigu/bin/jpsall
=============== node1 ===============
53157 DataNode
54517 Jps
53815 NodeManager
54087 JobHistoryServer
52959 NameNode
=============== node2 ===============
43362 NodeManager
43194 ResourceManager
42764 DataNode
44141 Jps
=============== node3 ===============
42120 DataNode
42619 NodeManager
42285 SecondaryNameNode
43229 Jps
[atguigu@node1 hadoop]$ jmap -heap 52959
Attaching to process ID 52959, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 357564416 (341.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 150470656 (143.5MB)
used = 94290264 (89.92220306396484MB)
free = 56180392 (53.577796936035156MB)
62.66355614213578% used
From Space:
capacity = 17825792 (17.0MB)
used = 0 (0.0MB)
free = 17825792 (17.0MB)
0.0% used
To Space:
capacity = 16777216 (16.0MB)
used = 0 (0.0MB)
free = 16777216 (16.0MB)
0.0% used
PS Old Generation
capacity = 66584576 (63.5MB)
used = 30315816 (28.911415100097656MB)
free = 36268760 (34.588584899902344MB)
45.529787559208906% used
15017 interned Strings occupying 1471800 bytes.
[atguigu@node1 hadoop]$ jmap -heap 53815
Attaching to process ID 53815, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.241-b07
using thread-local object allocation.
Parallel GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 0
MaxHeapFreeRatio = 100
MaxHeapSize = 1031798784 (984.0MB)
NewSize = 21495808 (20.5MB)
MaxNewSize = 343932928 (328.0MB)
OldSize = 43515904 (41.5MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 127926272 (122.0MB)
used = 40883224 (38.989280700683594MB)
free = 87043048 (83.0107192993164MB)
31.95842680383901% used
From Space:
capacity = 8388608 (8.0MB)
used = 8371472 (7.9836578369140625MB)
free = 17136 (0.0163421630859375MB)
99.79572296142578% used
To Space:
capacity = 10485760 (10.0MB)
used = 0 (0.0MB)
free = 10485760 (10.0MB)
0.0% used
PS Old Generation
capacity = 37748736 (36.0MB)
used = 11004056 (10.494285583496094MB)
free = 26744680 (25.505714416503906MB)
29.15079328748915% used
14761 interned Strings occupying 1313128 bytes.
[atguigu@node1 hadoop]$
P144【144_尚硅谷_Hadoop_生产调优手册_核心参数_NN心跳并发配置】03:12
1.2 NameNode心跳并发配置
NameNode有一个工作线程池,用来处理不同DataNode的并发心跳以及客户端并发的元数据操作。
对于大集群或者有大量客户端的集群来说,通常需要增大该参数。默认值是10。
<property>
<name>dfs.namenode.handler.count</name>
<value>21</value>
</property>
P145【145_尚硅谷_Hadoop_生产调优手册_核心参数_开启回收站】07:16
1.3 开启回收站配置
启用回收站,修改core-site.xml,配置垃圾回收时间为1分钟。
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
P146【146_尚硅谷_Hadoop_生产调优手册_HDFS压测环境准备】05:55
第2章 HDFS—集群压测
cd /opt/module/software/,python -m SimpleHTTPServer,允许外部通过“主机名称+端口号”的方式下载文件。
P147【147_尚硅谷_Hadoop_生产调优手册_HDFS读写压测】18:55
2.1 测试HDFS写性能
2.2 测试HDFS读性能
P148【148_尚硅谷_Hadoop_生产调优手册_NN多目录配置】08:25
第3章 HDFS—多目录
3.1 NameNode多目录配置
P149【149_尚硅谷_Hadoop_生产调优手册_DN多目录及磁盘间数据均衡】08:42
3.2 DataNode多目录配置
3.3 集群数据均衡之磁盘间数据均衡
生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性)
(1)生成均衡计划(我们只有一块磁盘,不会生成计划)
hdfs diskbalancer -plan hadoop103
(2)执行均衡计划
hdfs diskbalancer -execute hadoop103.plan.json
(3)查看当前均衡任务的执行情况
hdfs diskbalancer -query hadoop103
(4)取消均衡任务
hdfs diskbalancer -cancel hadoop103.plan.json
P150【150_尚硅谷_Hadoop_生产调优手册_添加白名单】10:02
第4章 HDFS—集群扩容及缩容
4.1 添加白名单
白名单:表示在白名单中配置的主机IP地址可以用来存储数据。
企业中:配置白名单,可以尽量防止黑客恶意访问攻击。
P151【151_尚硅谷_Hadoop_生产调优手册_服役新服务器】13:07
4.2 服役新服务器
需求:随着公司业务的增长,数据量越来越大,原有的数据节点的容量已经不能满足存储数据的需求,需要在原有集群基础上动态添加新的数据节点。
P152【152_尚硅谷_Hadoop_生产调优手册_服务器间数据均衡】03:16
4.3 服务器间数据均衡
P153【153_尚硅谷_Hadoop_生产调优手册_黑名单退役服务器】07:46
4.4 黑名单退役服务器
黑名单:表示在黑名单的主机IP地址不可以,用来存储数据。
企业中:配置黑名单,用来退役服务器。
P154【154_尚硅谷_Hadoop_生产调优手册_存储优化_5台服务器准备】11:21
克隆虚拟机,删除hadoop-3.1.3目录下的data与logs,修改xsync、myhadoop.sh、jpsall脚本。
P155【155_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码原理】08:16
第5章 HDFS—存储优化
注:演示纠删码和异构存储需要一共5台虚拟机。尽量拿另外一套集群。提前准备5台服务器的集群。
5.1 纠删码
5.1.1 纠删码原理
P156【156_尚硅谷_Hadoop_生产调优手册_存储优化_纠删码案例】10:42
5.1.2 纠删码案例实操
纠删码策略是给具体一个路径设置。所有往此路径下存储的文件,都会执行此策略。
默认只开启对RS-6-3-1024k策略的支持,如要使用别的策略需要提前启用。
P157【157_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储概述】08:36
5.2 异构存储(冷热数据分离)
异构存储主要解决,不同的数据,存储在不同类型的硬盘中,达到最佳性能的问题。
5.2.1 异构存储Shell操作
(1)查看当前有哪些存储策略可以用
[atguigu@hadoop102 hadoop-3.1.3]$ hdfs storagepolicies -listPolicies
(2)为指定路径(数据存储目录)设置指定的存储策略
hdfs storagepolicies -setStoragePolicy -path xxx -policy xxx
(3)获取指定路径(数据存储目录或文件)的存储策略
hdfs storagepolicies -getStoragePolicy -path xxx
(4)取消存储策略;执行改命令之后该目录或者文件,以其上级的目录为准,如果是根目录,那么就是HOT
hdfs storagepolicies -unsetStoragePolicy -path xxx
(5)查看文件块的分布
bin/hdfs fsck xxx -files -blocks -locations
(6)查看集群节点
hadoop dfsadmin -report
P158【158_尚硅谷_Hadoop_生产调优手册_存储优化_异构存储案例实操】17:40
5.2.2 测试环境准备
1)测试环境描述
服务器规模:5台
集群配置:副本数为2,创建好带有存储类型的目录(提前创建)。
P159【159_尚硅谷_Hadoop_生产调优手册_NameNode故障处理】09:09
第6章 HDFS—故障排除
注意:采用三台服务器即可,恢复到Yarn开始的服务器快照。
拷贝SecondaryNameNode中数据到原NameNode存储数据目录。namenode挂掉了,就将secondNamenode的数据拷贝过去。
P160【160_尚硅谷_Hadoop_生产调优手册_集群安全模式&磁盘修复】18:32
6.2 集群安全模式&磁盘修复
1)安全模式:文件系统只接受读数据请求,而不接受删除、修改等变更请求
2)进入安全模式场景
- NameNode在加载镜像文件和编辑日志期间处于安全模式;
- NameNode再接收DataNode注册时,处于安全模式
P161【161_尚硅谷_Hadoop_生产调优手册_慢磁盘监控】09:19
6.3 慢磁盘监控
“慢磁盘”指的时写入数据非常慢的一类磁盘。其实慢性磁盘并不少见,当机器运行时间长了,上面跑的任务多了,磁盘的读写性能自然会退化,严重时就会出现写入数据延时的问题。
如何发现慢磁盘?
正常在HDFS上创建一个目录,只需要不到1s的时间。如果你发现创建目录超过1分钟及以上,而且这个现象并不是每次都有。只是偶尔慢了一下,就很有可能存在慢磁盘。
可以采用如下方法找出是哪块磁盘慢:
1)通过心跳未联系时间。
2)fio命令,测试磁盘的读写性能。
P162【162_尚硅谷_Hadoop_生产调优手册_小文件归档】08:11
6.4 小文件归档
1)HDFS存储小文件弊端
每个文件均按块存储,每个块的元数据存储在NameNode的内存中,因此HDFS存储小文件会非常低效。因为大量的小文件会耗尽NameNode中的大部分内存。但注意,存储小文件所需要的磁盘容量和数据块的大小无关。例如,一个1MB的文件设置为128MB的块存储,实际使用的是1MB的磁盘空间,而不是128MB。
2)解决存储小文件办法之一
HDFS存档文件或HAR文件,是一个更高效的文件存档工具,它将文件存入HDFS块,在减少NameNode内存使用的同时,允许对文件进行透明的访问。具体说来,HDFS存档文件对内还是一个一个独立文件,对NameNode而言却是一个整体,减少了NameNode的内存。
P163【163_尚硅谷_Hadoop_生产调优手册_集群数据迁移】03:18
第7章 HDFS—集群迁移
7.1 Apache和Apache集群间数据拷贝
7.2 Apache和CDH集群间数据拷贝
P164【164_尚硅谷_Hadoop_生产调优手册_MR跑的慢的原因】02:43
第8章 MapReduce生产经验
8.1 MapReduce跑的慢的原因
MapReduce程序效率的瓶颈在于两点:
1)计算机性能
CPU、内存、磁盘、网络
2)I/O操作优化
(1)数据倾斜
(2)Map运行时间太长,导致Reduce等待过久
(3)小文件过多
P165【165_尚硅谷_Hadoop_生产调优手册_MR常用调优参数】12:27
8.2 MapReduce常用调优参数
P166【166_尚硅谷_Hadoop_生产调优手册_MR数据倾斜问题】05:26
8.3 MapReduce数据倾斜问题
P167【167_尚硅谷_Hadoop_生产调优手册_Yarn生产经验】01:18
第9章 Hadoop-Yarn生产经验
9.1 常用的调优参数
9.2 容量调度器使用
9.3 公平调度器使用
P168【168_尚硅谷_Hadoop_生产调优手册_HDFS小文件优化方法】10:15
第10章 Hadoop综合调优
10.1 Hadoop小文件优化方法
10.1.1 Hadoop小文件弊端
10.1.2 Hadoop小文件解决方案
4)开启uber模式,实现JVM重用(计算方向)
默认情况下,每个Task任务都需要启动一个JVM来运行,如果Task任务计算的数据量很小,我们可以让同一个Job的多个Task运行在一个JVM中,不必为每个Task都开启一个JVM。
P169【169_尚硅谷_Hadoop_生产调优手册_MapReduce集群压测】02:54
10.2 测试MapReduce计算性能
P170【170_尚硅谷_Hadoop_生产调优手册_企业开发场景案例】15:00
10.3 企业开发场景案例
10.3.1 需求
10.3.2 HDFS参数调优
10.3.3 MapReduce参数调优
10.3.4 Yarn参数调优
10.3.5 执行程序
尚硅谷大数据Hadoop教程-笔记01入门
视频地址:尚硅谷大数据Hadoop教程(Hadoop 3.x安装搭建到集群调优)
- 尚硅谷大数据Hadoop教程-笔记01【入门】
- 尚硅谷大数据Hadoop教程-笔记02【HDFS】
- 尚硅谷大数据Hadoop教程-笔记03【MapReduce】
- 尚硅谷大数据Hadoop教程-笔记04【Yarn】
- 尚硅谷大数据Hadoop教程-笔记04【生产调优手册】
- 尚硅谷大数据Hadoop教程-笔记04【源码解析】
目录
P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38
P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34
P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23
P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58
P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17
P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25
P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29
P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00
P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52
P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59
P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52
P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00
P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26
P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35
P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55
P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22
P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17
P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41
P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56
P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50
P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05
P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25
P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01
P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02
P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20
P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56
P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01
P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14
P029【029_尚硅谷_Hadoop_入门_ssh免密登录】11:25
P030【030_尚硅谷_Hadoop_入门_集群配置】13:24
P031【031_尚硅谷_Hadoop_入门_群起集群并测试】16:52
P032【032_尚硅谷_Hadoop_入门_集群崩溃处理办法】08:10
P033【033_尚硅谷_Hadoop_入门_历史服务器配置】05:26
P034【034_尚硅谷_Hadoop_入门_日志聚集功能配置】05:42
P035【035_尚硅谷_Hadoop_入门_两个常用脚本】09:18
P036【036_尚硅谷_Hadoop_入门_两道面试题】04:15
P037【037_尚硅谷_Hadoop_入门_集群时间同步】11:27
P038【038_尚硅谷_Hadoop_入门_常见问题总结】10:57
00_尚硅谷大数据Hadoop课程整体介绍
P001【001_尚硅谷_Hadoop_开篇_课程整体介绍】08:38
一、课程升级的重点内容
1、yarn
2、生产调优手册
3、源码
二、课程特色
1、新 hadoop3.1.3
2、细 从搭建集群开始 每一个配置每一行代码都有注释。出书
3、真 20+的企业案例 30+企业调优 从百万代码中阅读源码
4、全 全套资料
三、资料获取方式
1、关注尚硅谷教育 公众号:回复 大数据
2、谷粒学院
3、b站
四、技术基础要求
Javase,maven + idea + linux常用命令
01_尚硅谷大数据技术之大数据概论
P002【002_尚硅谷_Hadoop_概论_大数据的概念】04:34
第1章,大数据概念:大数据(Big Data):指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合,是需要新处理模式才能具有更强的决策力、洞察发现力和流程优化能力的海量、高增长率和多样化的信息资产。
大数据主要解决,海量数据的采集、存储和分析计算问题。
P003【003_尚硅谷_Hadoop_概论_大数据的特点】07:23
第2章,大数据特点(4V)
- Volume(大量)
- Velocity(高速)
- Variety(多样)
- Value(低价值密度)
P004【004_尚硅谷_Hadoop_概论_大数据的应用场景】09:58
第3章,大数据应用场景
- 抖音:推荐的都是你喜欢的视频。
- 电商站内广告推荐:给用户推荐可能喜欢的商品。
- 零售:分析用户消费习惯,为用户购买商品提供方便,从而提升商品销量。
- 物流仓储:京东物流,上午下单下午送达、下午下单次日上午送达。
- 保险:海量数据挖掘及风险预测,助力保险行业精准营销,提升精细化定价能力。
- 金融:多维度体现用户特征,帮助金融机构推荐优质客户,防范欺诈风险。
- 房产:大数据全面助力房地产行业,打造精准投策与营销,选出更合适的地,建造更合适的楼,卖给更合适的人。
- 人工智能 + 5G + 物联网 + 虚拟与现实。
P005【005_尚硅谷_Hadoop_概论_大数据的发展场景】08:17
第4章,好!
P006【006_尚硅谷_Hadoop_概论_未来工作内容】06:25
第5章,大数据部门间业务流程分析
第6章,大数据部门内组织结构
02_尚硅谷大数据技术之Hadoop(入门)V3.3
P007【007_尚硅谷_Hadoop_入门_课程介绍】07:29
P008【008_尚硅谷_Hadoop_入门_Hadoop是什么】03:00
P009【009_尚硅谷_Hadoop_入门_Hadoop发展历史】05:52
P010【010_尚硅谷_Hadoop_入门_Hadoop三大发行版本】05:59
Hadoop三大发行版本:Apache、Cloudera、Hortonworks。
1)Apache Hadoop
官网地址:http://hadoop.apache.org
下载地址:https://hadoop.apache.org/releases.html
2)Cloudera Hadoop
官网地址:https://www.cloudera.com/downloads/cdh
下载地址:https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_6_download.html
(1)2008年成立的Cloudera是最早将Hadoop商用的公司,为合作伙伴提供Hadoop的商用解决方案,主要是包括支持、咨询服务、培训。
(2)2009年Hadoop的创始人Doug Cutting也加盟Cloudera公司。Cloudera产品主要为CDH,Cloudera Manager,Cloudera Support
(3)CDH是Cloudera的Hadoop发行版,完全开源,比Apache Hadoop在兼容性,安全性,稳定性上有所增强。Cloudera的标价为每年每个节点10000美元。
(4)Cloudera Manager是集群的软件分发及管理监控平台,可以在几个小时内部署好一个Hadoop集群,并对集群的节点及服务进行实时监控。
3)Hortonworks Hadoop
官网地址:https://hortonworks.com/products/data-center/hdp/
下载地址:https://hortonworks.com/downloads/#data-platform
(1)2011年成立的Hortonworks是雅虎与硅谷风投公司Benchmark Capital合资组建。
(2)公司成立之初就吸纳了大约25名至30名专门研究Hadoop的雅虎工程师,上述工程师均在2005年开始协助雅虎开发Hadoop,贡献了Hadoop80%的代码。
(3)Hortonworks的主打产品是Hortonworks Data Platform(HDP),也同样是100%开源的产品,HDP除常见的项目外还包括了Ambari,一款开源的安装和管理系统。
(4)2018年Hortonworks目前已经被Cloudera公司收购。
P011【011_尚硅谷_Hadoop_入门_Hadoop优势】03:52
Hadoop优势(4高)
- 高可靠性
- 高拓展性
- 高效性
- 高容错性
P012【012_尚硅谷_Hadoop_入门_Hadoop1.x2.x3.x区别】03:00
P013【013_尚硅谷_Hadoop_入门_HDFS概述】06:26
Hadoop Distributed File System,简称 HDFS,是一个分布式文件系统。
- 1)NameNode(nn):存储文件的元数据,如文件名,文件目录结构,文件属性(生成时间、副本数、文件权限),以及每个文件的块列表和块所在的DataNode等。
- 2)DataNode(dn):在本地文件系统存储文件块数据,以及块数据的校验和。
- 3)Secondary NameNode(2nn):每隔一段时间对NameNode元数据备份。
P014【014_尚硅谷_Hadoop_入门_YARN概述】06:35
Yet Another Resource Negotiator 简称 YARN ,另一种资源协调者,是 Hadoop 的资源管理器。
P015【015_尚硅谷_Hadoop_入门_MapReduce概述】01:55
MapReduce 将计算过程分为两个阶段:Map 和 Reduce
- 1)Map 阶段并行处理输入数据
- 2)Reduce 阶段对 Map 结果进行汇总
P016【016_尚硅谷_Hadoop_入门_HDFS&YARN&MR关系】03:22
- HDFS
- NameNode:负责数据存储。
- DataNode:数据存储在哪个节点上。
- SecondaryNameNode:秘书,备份NameNode数据恢复NameNode部分工作。
- YARN:整个集群的资源管理。
- ResourceManager:资源管理,map阶段。
- NodeManager
- MapReduce
P017【017_尚硅谷_Hadoop_入门_大数据技术生态体系】09:17
大数据技术生态体系
推荐系统项目框架
P018【018_尚硅谷_Hadoop_入门_VMware安装】04:41
P019【019_尚硅谷_Hadoop_入门_Centos7.5软硬件安装】15:56
P020【020_尚硅谷_Hadoop_入门_IP和主机名称配置】10:50
[root@hadoop100 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.88.133 netmask 255.255.255.0 broadcast 192.168.88.255
inet6 fe80::363b:8659:c323:345d prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:0f:0a:6d txqueuelen 1000 (Ethernet)
RX packets 684561 bytes 1003221355 (956.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 53538 bytes 3445292 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 84 bytes 9492 (9.2 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 84 bytes 9492 (9.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:1c:3c:a9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hadoop100 ~]# systemctl restart network
[root@hadoop100 ~]# cat /etc/host
cat: /etc/host: 没有那个文件或目录
[root@hadoop100 ~]# cat /etc/hostname
hadoop100
[root@hadoop100 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@hadoop100 ~]# vim /etc/hosts
[root@hadoop100 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.88.100 netmask 255.255.255.0 broadcast 192.168.88.255
inet6 fe80::363b:8659:c323:345d prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:0f:0a:6d txqueuelen 1000 (Ethernet)
RX packets 684830 bytes 1003244575 (956.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 53597 bytes 3452600 (3.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 132 bytes 14436 (14.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 132 bytes 14436 (14.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:1c:3c:a9 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@hadoop100 ~]# ll
总用量 40
-rw-------. 1 root root 1973 3月 14 10:19 anaconda-ks.cfg
-rw-r--r--. 1 root root 2021 3月 14 10:26 initial-setup-ks.cfg
drwxr-xr-x. 2 root root 4096 3月 14 10:27 公共
drwxr-xr-x. 2 root root 4096 3月 14 10:27 模板
drwxr-xr-x. 2 root root 4096 3月 14 10:27 视频
drwxr-xr-x. 2 root root 4096 3月 14 10:27 图片
drwxr-xr-x. 2 root root 4096 3月 14 10:27 文档
drwxr-xr-x. 2 root root 4096 3月 14 10:27 下载
drwxr-xr-x. 2 root root 4096 3月 14 10:27 音乐
drwxr-xr-x. 2 root root 4096 3月 14 10:27 桌面
[root@hadoop100 ~]#
vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE="Ethernet"
PROXY_METHOD="none"
BROWSER_ONLY="no"
BOOTPROTO="static"
DEFROUTE="yes"
IPV4_FAILURE_FATAL="no"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
IPV6_DEFROUTE="yes"
IPV6_FAILURE_FATAL="no"
IPV6_ADDR_GEN_MODE="stable-privacy"
NAME="ens33"
UUID="3241b48d-3234-4c23-8a03-b9b393a99a65"
DEVICE="ens33"
ONBOOT="yes"IPADDR=192.168.88.100
GATEWAY=192.168.88.2
DNS1=192.168.88.2vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.88.100 hadoop100
192.168.88.101 hadoop101
192.168.88.102 hadoop102
192.168.88.103 hadoop103
192.168.88.104 hadoop104
192.168.88.105 hadoop105
192.168.88.106 hadoop106
192.168.88.107 hadoop107
192.168.88.108 hadoop108192.168.88.151 node1 node1.itcast.cn
192.168.88.152 node2 node2.itcast.cn
192.168.88.153 node3 node3.itcast.cn
P021【021_尚硅谷_Hadoop_入门_Xshell远程访问工具】09:05
P022【022_尚硅谷_Hadoop_入门_模板虚拟机准备完成】12:25
yum install -y epel-release
systemctl stop firewalld
systemctl disable firewalld.service
P023【023_尚硅谷_Hadoop_入门_克隆三台虚拟机】15:01
vim /etc/sysconfig/network-scripts/ifcfg-ens33
vim /etc/hostname
reboot
P024【024_尚硅谷_Hadoop_入门_JDK安装】07:02
在hadoop102上安装jdk,然后将jdk拷贝到hadoop103与hadoop104上。
P025【025_尚硅谷_Hadoop_入门_Hadoop安装】07:20
同P024图!
P026【026_尚硅谷_Hadoop_入门_本地运行模式】11:56
[root@node1 ~]# cd /export/server/hadoop-3.3.0/share/hadoop/mapreduce/
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wordcount/input /wordcount/output
2023-03-20 14:43:07,516 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 14:43:09,291 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0001
2023-03-20 14:43:11,916 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 14:43:12,313 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0001
2023-03-20 14:43:13,173 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 14:43:14,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 14:43:14,684 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 14:43:17,054 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0001
2023-03-20 14:43:17,123 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0001/
2023-03-20 14:43:17,124 INFO mapreduce.Job: Running job: job_1679293699463_0001
2023-03-20 14:43:52,340 INFO mapreduce.Job: Job job_1679293699463_0001 running in uber mode : false
2023-03-20 14:43:52,360 INFO mapreduce.Job: map 0% reduce 0%
2023-03-20 14:44:08,011 INFO mapreduce.Job: map 100% reduce 0%
2023-03-20 14:44:16,986 INFO mapreduce.Job: map 100% reduce 100%
2023-03-20 14:44:18,020 INFO mapreduce.Job: Job job_1679293699463_0001 completed successfully
2023-03-20 14:44:18,579 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=31
FILE: Number of bytes written=529345
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=142
HDFS: Number of bytes written=17
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=11303
Total time spent by all reduces in occupied slots (ms)=6220
Total time spent by all map tasks (ms)=11303
Total time spent by all reduce tasks (ms)=6220
Total vcore-milliseconds taken by all map tasks=11303
Total vcore-milliseconds taken by all reduce tasks=6220
Total megabyte-milliseconds taken by all map tasks=11574272
Total megabyte-milliseconds taken by all reduce tasks=6369280
Map-Reduce Framework
Map input records=2
Map output records=5
Map output bytes=53
Map output materialized bytes=31
Input split bytes=108
Combine input records=5
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=31
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=546
CPU time spent (ms)=3680
Physical memory (bytes) snapshot=499236864
Virtual memory (bytes) snapshot=5568684032
Total committed heap usage (bytes)=365953024
Peak Map Physical memory (bytes)=301096960
Peak Map Virtual memory (bytes)=2779201536
Peak Reduce Physical memory (bytes)=198139904
Peak Reduce Virtual memory (bytes)=2789482496
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=34
File Output Format Counters
Bytes Written=17
[root@node1 mapreduce]#
[root@node1 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.3.0.jar wordcount /wc_input /wc_output
2023-03-20 15:01:48,007 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.88.151:8032
2023-03-20 15:01:49,475 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1679293699463_0002
2023-03-20 15:01:50,522 INFO input.FileInputFormat: Total input files to process : 1
2023-03-20 15:01:51,010 INFO mapreduce.JobSubmitter: number of splits:1
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1679293699463_0002
2023-03-20 15:01:51,894 INFO mapreduce.JobSubmitter: Executing with tokens: []
2023-03-20 15:01:52,684 INFO conf.Configuration: resource-types.xml not found
2023-03-20 15:01:52,687 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2023-03-20 15:01:53,237 INFO impl.YarnClientImpl: Submitted application application_1679293699463_0002
2023-03-20 15:01:53,487 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1679293699463_0002/
2023-03-20 15:01:53,492 INFO mapreduce.Job: Running job: job_1679293699463_0002
2023-03-20 15:02:15,329 INFO mapreduce.Job: Job job_1679293699463_0002 running in uber mode : false
2023-03-20 15:02:15,342 INFO mapreduce.Job: map 0% reduce 0%
2023-03-20 15:02:26,652 INFO mapreduce.Job: map 100% reduce 0%
2023-03-20 15:02:40,297 INFO mapreduce.Job: map 100% reduce 100%
2023-03-20 15:02:41,350 INFO mapreduce.Job: Job job_1679293699463_0002 completed successfully
2023-03-20 15:02:41,557 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=60
FILE: Number of bytes written=529375
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=149
HDFS: Number of bytes written=38
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8398
Total time spent by all reduces in occupied slots (ms)=9720
Total time spent by all map tasks (ms)=8398
Total time spent by all reduce tasks (ms)=9720
Total vcore-milliseconds taken by all map tasks=8398
Total vcore-milliseconds taken by all reduce tasks=9720
Total megabyte-milliseconds taken by all map tasks=8599552
Total megabyte-milliseconds taken by all reduce tasks=9953280
Map-Reduce Framework
Map input records=4
Map output records=6
Map output bytes=69
Map output materialized bytes=60
Input split bytes=100
Combine input records=6
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=60
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1000
CPU time spent (ms)=3880
Physical memory (bytes) snapshot=503771136
Virtual memory (bytes) snapshot=5568987136
Total committed heap usage (bytes)=428343296
Peak Map Physical memory (bytes)=303013888
Peak Map Virtual memory (bytes)=2782048256
Peak Reduce Physical memory (bytes)=200757248
Peak Reduce Virtual memory (bytes)=2786938880
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=49
File Output Format Counters
Bytes Written=38
[root@node1 mapreduce]# pwd
/export/server/hadoop-3.3.0/share/hadoop/mapreduce
[root@node1 mapreduce]#
P027【027_尚硅谷_Hadoop_入门_scp&rsync命令讲解】15:01
第一次同步用scp,后续同步用rsync。
rsync主要用于备份和镜像,具有速度快、避免复制相同内容和支持符号链接的优点。
rsync和scp区别:用rsync做文件的复制要比scp的速度快,rsync只对差异文件做更新。scp是把所有文件都复制过去。
P028【028_尚硅谷_Hadoop_入门_xsync分发脚本】18:14
拷贝同步命令
- scp(secure copy)安全拷贝
- rsync 远程同步工具
- xsync 集群分发脚本
dirname命令:截取文件的路径,去除文件名中的非目录部分,仅显示与目录有关的内容。
[root@node1 ~]# dirname /home/atguigu/a.txt
/home/atguigu
[root@node1 ~]#basename命令:获取文件名称。
[root@node1 atguigu]# basename /home/atguigu/a.txt
a.txt
[root@node1 atguigu]#
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4. 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
[root@node1 bin]# chmod 777 xsync
[root@node1 bin]# ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[root@node1 bin]# cd ..
[root@node1 atguigu]# xsync bin/
==================== node1 ====================
sending incremental file list
sent 94 bytes received 17 bytes 222.00 bytes/sec
total size is 727 speedup is 6.55
==================== node2 ====================
sending incremental file list
bin/
bin/xsync
sent 871 bytes received 39 bytes 606.67 bytes/sec
total size is 727 speedup is 0.80
==================== node3 ====================
sending incremental file list
bin/
bin/xsync
sent 871 bytes received 39 bytes 1,820.00 bytes/sec
total size is 727 speedup is 0.80
[root@node1 atguigu]# pwd
/home/atguigu
[root@node1 atguigu]# ls -al
总用量 20
drwx------ 6 atguigu atguigu 168 3月 20 15:56 .
drwxr-xr-x. 6 root root 56 3月 20 10:08 ..
-rw-r--r-- 1 root root 0 3月 20 15:44 a.txt
-rw------- 1 atguigu atguigu 21 3月 20 11:48 .bash_history
-rw-r--r-- 1 atguigu atguigu 18 8月 8 2019 .bash_logout
-rw-r--r-- 1 atguigu atguigu 193 8月 8 2019 .bash_profile
-rw-r--r-- 1 atguigu atguigu 231 8月 8 2019 .bashrc
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
drwxrwxr-x 3 atguigu atguigu 18 3月 20 10:17 .cache
drwxrwxr-x 3 atguigu atguigu 18 3月 20 10:17 .config
drwxr-xr-x 4 atguigu atguigu 39 3月 10 20:04 .mozilla
-rw------- 1 atguigu atguigu 1261 3月 20 15:56 .viminfo
[root@node1 atguigu]#
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password:
atguigu@node1's password:
sending incremental file list
sent 98 bytes received 17 bytes 17.69 bytes/sec
total size is 727 speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password:
atguigu@node3's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
[atguigu@node1 ~]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:20 2023 from 192.168.88.151
[root@node2 ~]# su atguigu
[atguigu@node2 root]$ vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 root]$ su root
密码:
[root@node2 ~]# vim /etc/sudoers
[root@node2 ~]# cd /opt/
[root@node2 opt]# ll
总用量 0
drwxr-xr-x 4 atguigu atguigu 46 3月 20 11:32 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxr-xr-x 2 atguigu atguigu 67 3月 20 10:47 software
[root@node2 opt]# su atguigu
[atguigu@node2 opt]$ cd /home/atguigu/
[atguigu@node2 ~]$ llk
bash: llk: 未找到命令
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node2 ~]$ cd ~
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node2 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
您在 /var/spool/mail/root 中有新邮件
[atguigu@node2 ~]$ cd bin
[atguigu@node2 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[atguigu@node2 bin]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 17:22:26 2023 from 192.168.88.152
[root@node3 ~]# vim /etc/sudoers
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd /opt/
[root@node3 opt]# ll
总用量 0
drwxr-xr-x 4 atguigu atguigu 46 3月 20 11:32 module
drwxr-xr-x. 2 root root 6 10月 31 2018 rh
drwxr-xr-x 2 atguigu atguigu 67 3月 20 10:47 software
[root@node3 opt]# cd ~
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
您在 /var/spool/mail/root 中有新邮件
[root@node3 ~]# cd ~
[root@node3 ~]# ll
总用量 4
-rw-------. 1 root root 1340 9月 11 2020 anaconda-ks.cfg
-rw------- 1 root root 0 2月 23 16:20 nohup.out
[root@node3 ~]# su atguigu
[atguigu@node3 root]$ cd ~
[atguigu@node3 ~]$ ls
bin
[atguigu@node3 ~]$ ll
总用量 0
drwxrwxr-x 2 atguigu atguigu 19 3月 20 15:56 bin
[atguigu@node3 ~]$ cd bin
[atguigu@node3 bin]$ ll
总用量 4
-rwxrwxrwx 1 atguigu atguigu 727 3月 20 16:00 xsync
[atguigu@node3 bin]$
----------------------------------------------------------------------------------------
连接成功
Last login: Mon Mar 20 16:01:40 2023
[root@node1 ~]# su atguigu
[atguigu@node1 root]$ cd /home/atguigu/
[atguigu@node1 ~]$ pwd
/home/atguigu
[atguigu@node1 ~]$ xsync bin/
==================== node1 ====================
The authenticity of host 'node1 (192.168.88.151)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.88.151' (ECDSA) to the list of known hosts.
atguigu@node1's password:
atguigu@node1's password:
sending incremental file list
sent 98 bytes received 17 bytes 17.69 bytes/sec
total size is 727 speedup is 6.32
==================== node2 ====================
The authenticity of host 'node2 (192.168.88.152)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2,192.168.88.152' (ECDSA) to the list of known hosts.
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
==================== node3 ====================
The authenticity of host 'node3 (192.168.88.153)' can't be established.
ECDSA key fingerprint is SHA256:+eLT3FrOEuEsxBxjOd89raPi/ChJz26WGAfqBpz/KEk.
ECDSA key fingerprint is MD5:18:42:ad:0f:2b:97:d8:b5:68:14:6a:98:e9:72:db:bb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.88.153' (ECDSA) to the list of known hosts.
atguigu@node3's password:
atguigu@node3's password:
sending incremental file list
sent 94 bytes received 17 bytes 44.40 bytes/sec
total size is 727 speedup is 6.55
[atguigu@node1 ~]$ xsync /etc/profile.d/my_env.sh
==================== node1 ====================
atguigu@node1's password:
atguigu@node1's password:
.sending incremental file list
sent 48 bytes received 12 bytes 13.33 bytes/sec
total size is 223 speedup is 3.72
==================== node2 ====================
atguigu@node2's password:
atguigu@node2's password:
sending incremental file list
my_env.sh
rsync: mkstemp "/etc/profile.d/.my_env.sh.guTzvB" failed: Permission denied (13)
sent 95 bytes received 126 bytes 88.40 bytes/sec
total size is 223 speedup is 1.01
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1178) [sender=3.1.2]
==================== node3 =========以上是关于尚硅谷大数据技术Hadoop教程-笔记06Hadoop-生产调优手册的主要内容,如果未能解决你的问题,请参考以下文章