YARN(含MR2)常用配置

Posted 小基基o_O

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了YARN(含MR2)常用配置相关的知识,希望对你有一定的参考价值。

文章目录

YARN架构图

YARN工作机制

调度器

资源调度器的类

yarn.resourcemanager.scheduler.class

原文:
The class to use as the resource scheduler.
译文:
资源调度器的类
容量调度器是 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
公平调度器是 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler

最大优先级

yarn.cluster.max-application-priority

原文
Defines maximum application priority in a cluster.
If an application is submitted with a priority higher than this value, it will be reset to this maximum value.
译文
定义集群中应用程序的最大优先级

处理调度器请求的线程数量

yarn.resourcemanager.scheduler.client.thread-count

原文
Number of threads to handle scheduler interface.
译文
处理调度器接口的线程数

NodeManager

单节点NN可分配的物理内存

yarn.nodemanager.resource.memory-mb

原文:
Amount of physical memory, in MB, that can be allocated for containers.
If set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
译文:
当前节点NodeManager可分配给容器们的物理内存量(以MB为单位)
如果设置为-1且 yarn.nodemanager.resource.detect-hardware-capabilities为true,就会自动计算
其它情况默认8192MB

例如有10个NM,每个NN配置内存50G,则总内存是500G

单节点NN可分配的虚拟核心数

yarn.nodemanager.resource.cpu-vcores

原文
Number of vcores that can be allocated for containers.
This is used by the RM scheduler when allocating resources for containers.
This is not used to limit the number of CPUs used by YARN containers.
If it is set to -1 and yarn.nodemanager.resource.detect-hardware-capabilities is true, it is automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
译文
当前节点NodeManager可分配给容器们的虚拟核心数
如果设置为-1且 yarn.nodemanager.resource.detect-hardware-capabilities是true,就依据硬件来自动确定虚拟核数
其它情况默认8

单节点预留给非YARN进程的物理内存总量

yarn.nodemanager.resource.system-reserved-memory-mb

原文:
Amount of physical memory, in MB, that is reserved for non-YARN processes.
This configuration is only used if yarn.nodemanager.resource.detect-hardware-capabilities is set to true and yarn.nodemanager.resource.memory-mb is -1.
If set to -1, this amount is calculated as 20% of (system memory - 2* HADOOP_HEAPSIZE)
译文:
预留给非YARN进程的物理内存总量,单位为MB
yarn.nodemanager.resource.detect-hardware-capabilities为true且 yarn.nodemanager.resource.memory-mb为-1时生效
如果设置为-1, 计 算 值 = ( 系 统 内 存 − 2 × H A D O O P _ H E A P S I Z E ) × 20 % 计算值=(系统内存 - 2 \\times HADOOP\\_HEAPSIZE)\\times 20 \\% =(2×HADOOP_HEAPSIZE)×20%

每个容器可分配的最小内存

yarn.scheduler.minimum-allocation-mb

原文:
The minimum allocation for every container request at the RM in MBs.
Memory requests lower than this will be set to the value of this property.
Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.
译文:
每个容器(向RM请求)可分配的最小内存,单位MB
低于此值的内存请求将被设置为此属性的值
此外,内存小于此值的NM节点将被RM关闭

每个容器可分配的最大内存

yarn.scheduler.maximum-allocation-mb

原文:
The maximum allocation for every container request at the RM in MBs.
Memory requests higher than this will throw an InvalidResourceRequestException.
译文:
每个容器(向RM请求)可分配的最大内存,单位MB
高于此值的内存请求将抛出 InvalidResourceRequestException

每个容器可分配的最少虚拟核数

yarn.scheduler.minimum-allocation-vcores

原文:
The minimum allocation for every container request at the RM in terms of virtual CPU cores.
Requests lower than this will be set to the value of this property.
Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.
译文:
每个容器(向RM请求)可分配的最少虚拟核数
低于此值的请求将被设置为此属性的值
此外,虚拟核心数少于此值的NM节点将被RM关闭

每个容器可分配的最多虚拟核数

yarn.scheduler.maximum-allocation-vcores

原文:
The maximum allocation for every container request at the RM in terms of virtual CPU cores.
Requests higher than this will throw an InvalidResourceRequestException.
译文:
每个容器(向RM请求)可分配的最多虚拟核数
高于此值的请求将抛出 InvalidResourceRequestException

自动检测节点资源

yarn.nodemanager.resource.detect-hardware-capabilities

原文:
Enable auto-detection of node capabilities such as memory and CPU.
译文:
是否启用自动检测节点资源(内存、CPU…)
false为禁用,true为启用

虚拟核数和物理核数的转换乘数

yarn.nodemanager.resource.pcores-vcores-multiplier

译文:
Multiplier to determine how to convert phyiscal cores to vcores.
This value is used if yarn.nodemanager.resource.cpu-vcores is set to -1(which implies auto-calculate vcores) and yarn.nodemanager.resource.detect-hardware-capabilities is set to true.
The number of vcores will be calculated as number of CPUs * multiplier.
译文:
虚拟核数和物理核数的转换乘数
yarn.nodemanager.resource.cpu-vcores为-1且 yarn.nodemanager.resource.detect-hardware-capabilities为true时,此值生效
虚 拟 核 心 总 数 = 物 理 核 心 总 数 × 转 换 乘 数 虚拟核心总数=物理核心总数 \\times 转换乘数 =×

例如:4核8线程,该参数就设为2

MapReduce

每个Map任务的虚拟核心数

mapreduce.map.cpu.vcores

原文:
The number of virtual cores to request from the scheduler for each map task.

每个Map任务的内存

mapreduce.map.memory.mb

原文:
The amount of memory to request from the scheduler for each map task.
If this is not specified or is non-positive, it is inferred from mapreduce.map.java.opts and mapreduce.job.heap.memory-mb.ratio.
If java-opts are also not specified, we set it to 1024.
译文
每个向调度器请求的Map任务的内存,单位MB
如果冇指定,就根据 mapreduce.map.java.optsmapreduce.job.heap.memory-mb.ratio来推断
如果 mapreduce.map.java.opts也没指定,就1024MB

每个Reduce任务的虚拟核心数

mapreduce.reduce.cpu.vcores

原文:
The number of virtual cores to request from the scheduler for each reduce task.

每个Reduce任务的内存

mapreduce.reduce.memory.mb

原文:
The amount of memory to request from the scheduler for each reduce task.
If this is not specified or is non-positive, it is inferred from mapreduce.reduce.java.opts and mapreduce.job.heap.memory-mb.ratio.
If java-opts are also not specified, we set it to 1024.
译文
每个向调度器请求的Reduce任务的内存,单位MB
如果冇指定,就根据 mapreduce.reduce.java.optsmapreduce.job.heap.memory-mb.ratio来推断
如果 mapreduce.reduce.java.opts也没指定,就1024MB

堆大小与容器大小的比率

mapreduce.job.heap.memory-mb.ratio

原文:
The ratio of heap-size to container-size.
If no -Xmx is specified, it is calculated as ( mapreduce.map|reduce.memory.mb * mapreduce.heap.memory-mb.ratio).
If -Xmx is specified but not mapreduce.map|reduce.memory.mb, it is calculated as (heapSize / mapreduce.heap.memory-mb.ratio).
译文
堆大小与容器大小的比率
当冇指定 -Xmx时:堆大小= mapreduce.map|reduce.memory.mb × \\times × mapreduce.heap.memory-mb.ratio
当指定了 -Xmx,但冇指定 mapreduce.map|reduce.memory. mb时: mapreduce.map|reduce.memory.mb=堆大小/ mapreduce.heap.memory-mb.ratio

Appendix

en🔉cn
invalidɪnˈvælɪd作废的;不能识别的;
inferɪnˈfɜːrv. 推断
specifyˈspesɪfaɪv. 明确指出;具体说明
convertkənˈvɜːrtv. (使)转换
multiplierˈmʌltɪplaɪərn. [数] 乘数;[电子] 倍增器;增加者;繁殖者
implyɪmˈplaɪv. 暗示;意味着;必然包含
-Xms初始Java堆内存大小
-Xmx最大Java堆内存大小

以上是关于YARN(含MR2)常用配置的主要内容,如果未能解决你的问题,请参考以下文章

HDP3.1 中 YRAN 和 MR2 的内存大小配置的计算方式

MR1和MR2的工作原理

Spark基础:Spark on Yarn(上)

拉格朗日乘数法

工作常用之Yarn详解资源调度与隔离

工作常用之Yarn详解资源调度与隔离