flink-1.11.0+hadoop3.2.2 部署flink on yarn

Posted 逃跑的沙丁鱼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了flink-1.11.0+hadoop3.2.2 部署flink on yarn相关的知识,希望对你有一定的参考价值。

目录

1 下载上传解压

2 配置JobManager高可用配置

2.1 修改yarn-site.xml

2.2 分发配置yarn-site.xml到node2,node3

2.3 重启yarn服务

3.1 高可用设置

3.2 JobManager与TaskManager配置

3.3 scp 拷贝flink文件到node2,node3

3.4 添加环境变量

4.1 申请资源

4.2 查看申请的资源

4.3 提交wordCount计算作业


1 下载上传解压

https://flink.apache.org/downloads.html#flink

tar -xvf flink-1.11.0-bin-scala_2.12.tgz -C ../soft

2 配置JobManager高可用配置

2.1 修改yarn-site.xml

这需要修改hadoop 的yarn 的配置文件yarn-site.xml

   <property>
		<name>yarn.resourcemanager.am.max-attempts</name>
		<value>4</value>
		<description>
         The maximum number of application master execution attempts.
		</description>
	</property>

应用程序主执行尝试的最大次数。这里设置为4了默认是2,主要是为了测试,其实默认就已经够了

是yarn集群上的全局配置,对运行在集群上的所有flink任务起作用;

在yarn-site.xml中配置;

 

yarn.application-attempts

只对当前的flink Job起作用,且不能大于yarn.resourcemanager.am.max-attempts,不然会被yarn.resourcemanager.am.max-attempts的值覆盖掉;

在flink-conf.yaml中配置(每个Job有一个 application master,每个application master 有一个 flink-conf.yaml);

2.2 分发配置yarn-site.xml到node2,node3

[liucf@node1 flink-1.11.0]$ scp /home/liucf/soft/hadoop-3.2.2/etc/hadoop/yarn-site.xml liucf@node2:/home/liucf/soft/hadoop-3.2.2/etc/hadoop/
yarn-site.xml                                                                                       100% 1534     1.5MB/s   00:00    
[liucf@node1 flink-1.11.0]$ scp /home/liucf/soft/hadoop-3.2.2/etc/hadoop/yarn-site.xml liucf@node3:/home/liucf/soft/hadoop-3.2.2/etc/hadoop/
yarn-site.xml  

2.3 重启yarn服务

"restartyarn"){
	ssh node1 "/home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh stop resourcemanager"
	for i in node2 node3
	do
		ssh $i "/home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh stop nodemanager"
	done

	ssh node1 "/home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start resourcemanager"
	for i in node2 node3
	do
		ssh $i "/home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start nodemanager"
	done
};;

这是我自己写的启动脚本的一部分

3 配置文件flink

3.1 高可用设置

flink-conf.yaml

high-availability: zookeeper
high-availability.zookeeper.path.root: /flink

high-availability.storageDir: hdfs://node1:8020/flink/ha

high-availability.zookeeper.quorum: node1:2181,node2:2181,node3:2181

high-availability (必须)

作用:设置高可用用的模式,一般使用zookeeper来做flink ha的协调服务.

ZooKeeper quorum (必须)

作用:ZooKeeper
quorum(仲裁集)是ZooKeeper服务器的备份组,它提供分布式协调服务。

ZooKeeper root (必须)

3.2 JobManager与TaskManager配置

flink-conf.yaml
 

jobmanager.rpc.port: 6123

jobmanager.memory.process.size: 1600m

taskmanager.memory.process.size: 1728m

3.3 scp 拷贝flink文件到node2,node3

scp -r flink-1.11.0 liucf@node2:/home/liucf/soft
scp -r flink-1.11.0 liucf@node3:/home/liucf/soft

3.4 添加环境变量

node1,node2,node3 都需要做

#FLINK_HOME
export FLINK_HOME=/home/liucf/soft/flink-1.11.0
PATH=$FLINK_HOME/bin:$KAFKA_HOME/bin:$SPARK_HOME/bin:$SCALA_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH

#flink on yarn配置
HADOOP_CONF_DIR=$HADOOP_HOME
export HADOOP_CLASSPATH=`hadoop classpath`

使生效

source /etc/profile

4 测试flink

向yarn集群申请资源并提交任务

4.1 申请资源

 yarn-session.sh -nm wordCount  -n 2
[liucf@node2 ~]$ yarn-session.sh -nm wordCount  -n 2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/liucf/soft/flink-1.11.0/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/liucf/soft/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-06-20 14:00:19,925 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-06-20 14:00:19,928 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-06-20 14:00:19,928 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-06-20 14:00:19,928 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-06-20 14:00:19,928 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: parallelism.default, 1
2021-06-20 14:00:19,928 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: yarn.application-attempts, 10
2021-06-20 14:00:19,929 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability, zookeeper
2021-06-20 14:00:19,929 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.zookeeper.path.root, /flink
2021-06-20 14:00:19,929 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.storageDir, hdfs://node1:8020/flink/ha
2021-06-20 14:00:19,929 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: high-availability.zookeeper.quorum, node1:2181,node2:2181,node3:2181
2021-06-20 14:00:19,929 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: state.backend, filesystem
2021-06-20 14:00:19,930 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-06-20 14:00:19,930 INFO  org.apache.flink.configuration.GlobalConfiguration           [] - Loading configuration property: rest.port, 8081

4.2 查看申请的资源

[liucf@node3 ~]$ yarn application --list
2021-06-20 14:02:45,505 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.109.151:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State          Final-State	       Progress	                       Tracking-URL
application_1624159539256_0002	           wordCount	        Apache Flink	     liucf	   default	           RUNNING            UNDEFINED	             0%	                  http://node2:8081
[liucf@node3 ~]$ 

yarn UI

 

4.3 提交wordCount计算作业

[liucf@node3 batch]$ flink run -yid application_1624159539256_0002  /home/liucf/soft/flink-1.11.0/examples/batch/WordCount.jar
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/liucf/soft/flink-1.11.0/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/liucf/soft/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
2021-06-20 14:06:57,250 WARN  org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The configuration directory ('/home/liucf/soft/flink-1.11.0/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2021-06-20 14:06:57,290 INFO  org.apache.hadoop.yarn.client.RMProxy                        [] - Connecting to ResourceManager at node1/192.168.109.151:8032
2021-06-20 14:06:57,507 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-06-20 14:06:57,509 WARN  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2021-06-20 14:06:57,603 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface node2:8081 of application 'application_1624159539256_0002'.
Job has been submitted with JobID f04563e820173a57f89b3a0a790d08dc
Program execution finished
Job with JobID f04563e820173a57f89b3a0a790d08dc has finished.
Job Runtime: 10604 ms
Accumulator Results: 
- 9c90ccc31f75a40930c208e889c8f4e8 (java.util.ArrayList) [170 elements]


(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
(ay,1)
(bare,1)
(be,4)
(bear,3)
(bodkin,1)
(bourn,1)
(but,1)
(by,2)
  ...

flink 界面

到此安装完成

 

 

以上是关于flink-1.11.0+hadoop3.2.2 部署flink on yarn的主要内容,如果未能解决你的问题,请参考以下文章

鲲鹏arrch64系统编译安装Hadoop3.2.2

centos7+hadoop3.2.2+mysql5.7.33安装hive3.1.2

官宣 | 千呼万唤,Apache Flink 1.11.0 正式发布啦!(内含福利)

解决 Flink 1.11.0 sql 不能指定 jobName 的问题

虚拟机安装与配置+haoop3.2.2伪分布式安装

Hbase使用shell命令出现报错:PleaseHoldException: Master is initializing 解决办法