如何使用 Yarn ResourceManager REST API 杀死 Spark 应用程序

Posted

技术标签:

【中文标题】如何使用 Yarn ResourceManager REST API 杀死 Spark 应用程序【英文标题】:How to Kill a Spark Application using Yarn ResourceManager REST API 【发布时间】:2020-12-27 04:30:49 【问题描述】:

我正在尝试使用 Yarn REST 资源管理器 API 来终止在 Yarn 上运行的 spark 应用程序。 下面是我试图杀死应用程序的两个不同的 PUT 命令:

    第一个命令
curl -X PUT 'http://<HOSTNAME>:8088/ws/v1/cluster/apps/<APPLICATION_ID>/state' -d '"state": "KILLED"'

结果:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><RemoteException><exception>WebApplicationException</exception><javaClassName>javax.ws.rs.WebApplicationException</javaClassName></RemoteException>
    第二条命令
curl -v -X PUT -H "Content-Type: application/json" -d '"state": "KILLED"' 'http://<HOSTNAME>:8088/ws/v1/cluster/apps/<APPLICATION_ID>/state'

结果:

* About to connect() to <HOSTNAME> port 8088 (#0)
*   Trying <IP>...
* Connected to <HOSTNAME> (<IP>) port 8088 (#0)
> PUT /ws/v1/cluster/apps/<APPLICATION_ID>/state HTTP/1.1
> User-Agent: curl/<SOME IP>
> Host: <HOSTNAME>:8088
> Accept: */*
> Content-Type: application/json
> Content-Length: 19
>
* upload completely sent off: 19 out of 19 bytes
< HTTP/1.1 403 Forbidden
< Cache-Control: no-cache
< Expires: Mon, 07 Sep 2020 18:26:46 GMT
< Date: Mon, 07 Sep 2020 18:26:46 GMT
< Pragma: no-cache
< Expires: Mon, 07 Sep 2020 18:26:46 GMT
< Date: Mon, 07 Sep 18:26:46 GMT
< Pragma: no-cache
< Content-Type: application/json
< X-Frame-Options: SAMEORIGIN
< Transfer-Encoding: chunked
< Server: Jetty(<SOME IP>.hwx)
<
* Connection #0 to host <HOSTNAME> left intact
"RemoteException":"exception":"ForbiddenException","message":"java.lang.Exception: The default static user cannot carry out this operation.","javaClassName":"org.apache.hadoop.yarn.webapp.ForbiddenException"

我在这里遗漏了什么还是需要提供用户 ID。 杀死应用程序的正确命令是什么。请提出建议。

谢谢

【问题讨论】:

【参考方案1】:

根据这个ResourceManager API document,需要对PUT请求进行认证。

一般来说,如果我们在 Hadoop 中提到身份验证,最基本的就是 Kerberos 身份验证。

所以你需要先确认为 HDFS 和 YARN 启用了 Web Console 的 Kerberos 身份验证。 如果您使用 Cloudera Manager 来管理您的 CDH/CDP 集群,您可以参考这个document。 如果您使用的是原始 Hadoop 或其他 Hadoop 产品,请查找相应的文档。

为集群和 Web 控制台启用基本身份验证后,您可以使用任何能够与 Kerberos 集成的方式来执行 HTTP API 请求。 这是一个例子:

    在 c4669-node2 上提交 MapReduce 作业:
[root@c4669-node2 63-hdfs-DATANODE]# yarn jar /opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p0.6626826/jars/hadoop-mapreduce-client-jobclient-3.0.0-cdh6.3.4-tests.jar sleep -Dmapred.job.queue.name=a1 -m 1 -r 1 -rt 1200000 -mt 20
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
...
21/01/08 07:06:08 INFO client.RMProxy: Connecting to ResourceManager at c4669-node4.coelab.cloudera.com/172.25.39.199:8032
21/01/08 07:06:08 INFO hdfs.DFSClient: Created token for cloudera: HDFS_DELEGATION_TOKEN owner=cloudera@COELAB.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1610089568852, maxDate=1610694368852, sequenceNumber=4, masterKeyId=4 on 172.25.34.78:8020
21/01/08 07:06:08 INFO security.TokenCache: Got dt for hdfs://c4669-node2.coelab.cloudera.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.25.34.78:8020, Ident: (token for cloudera: HDFS_DELEGATION_TOKEN owner=cloudera@COELAB.CLOUDERA.COM, renewer=yarn, realUser=, issueDate=1610089568852, maxDate=1610694368852, sequenceNumber=4, masterKeyId=4)
21/01/08 07:06:08 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/cloudera/.staging/job_1610089441463_0001
...
21/01/08 07:06:10 INFO impl.YarnClientImpl: Submitted application application_1610089441463_0001
21/01/08 07:06:10 INFO mapreduce.Job: The url to track the job: http://c4669-node4.coelab.cloudera.com:8088/proxy/application_1610089441463_0001/
21/01/08 07:06:10 INFO mapreduce.Job: Running job: job_1610089441463_0001
21/01/08 07:06:20 INFO mapreduce.Job: Job job_1610089441463_0001 running in uber mode : false
21/01/08 07:06:20 INFO mapreduce.Job:  map 0% reduce 0%
21/01/08 07:06:25 INFO mapreduce.Job:  map 100% reduce 0%
21/01/08 07:06:42 INFO mapreduce.Job:  map 100% reduce 67%
21/01/08 07:07:06 INFO mapreduce.Job:  map 100% reduce 68%
21/01/08 07:07:11 INFO mapreduce.Job:  map 0% reduce 0%
21/01/08 07:07:11 INFO mapreduce.Job: Job job_1610089441463_0001 failed with state KILLED due to: Application application_1610089441463_0001 was killed by user cloudera
21/01/08 07:07:11 INFO mapreduce.Job: Counters: 0
[root@c4669-node2 63-hdfs-DATANODE]#

注意:

application_1610089441463_0001 被用户 cloudera 杀死”

是由于来自下面的PUT 请求。

    在c4669-node3上,使用curl工具发送PUT请求:
[root@c4669-node3 yum.repos.d]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: cloudera@COELAB.CLOUDERA.COM

Valid starting       Expires              Service principal
01/08/2021 06:39:56  01/09/2021 06:39:56  krbtgt/COELAB.CLOUDERA.COM@COELAB.CLOUDERA.COM
01/08/2021 06:56:13  01/09/2021 06:39:56  HTTP/c4669-node4.coelab.cloudera.com@
01/08/2021 06:56:13  01/09/2021 06:39:56  HTTP/c4669-node4.coelab.cloudera.com@COELAB.CLOUDERA.COM
[root@c4669-node3 yum.repos.d]# yarn application -list -appStates 'NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING'
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/01/08 07:00:36 INFO client.RMProxy: Connecting to ResourceManager at c4669-node4.coelab.cloudera.com/172.25.39.199:8032
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
application_1610088875054_0001             Sleep job               MAPREDUCE      cloudera         root.a1                 RUNNING               UNDEFINED               83.52% http://c4669-node4.coelab.cloudera.com:44759
[root@c4669-node3 yum.repos.d]# clear
[root@c4669-node3 yum.repos.d]# yarn application -list -appStates 'NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING'
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
21/01/08 07:06:32 INFO client.RMProxy: Connecting to ResourceManager at c4669-node4.coelab.cloudera.com/172.25.39.199:8032
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State             Progress                        Tracking-URL
application_1610089441463_0001             Sleep job               MAPREDUCE      cloudera         root.a1                 RUNNING               UNDEFINED                  50% http://c4669-node3.coelab.cloudera.com:35559
[root@c4669-node3 yum.repos.d]# curl --negotiate -u : -b ~/cookiejar.txt -c ~/cookiejar.txt http://c4669-node4.coelab.cloudera.com:8088/ws/v1/cluster/apps/application_1610089441463_0001/state
"state":"RUNNING"[root@c4669-node3 yum.repos.d]# curl --negotiate -u : -b ~/cookiejar.txt -c ~/cookiejar.txt -XPUT -H "Content-type: application/json" -d '
>   "state":"KILLED"
> ' 'http://c4669-node4.coelab.cloudera.com:8088/ws/v1/cluster/apps/application_1610089441463_0001/state'
"state":"FINAL_SAVING"[root@c4669-node3 yum.repos.d]#

【讨论】:

【参考方案2】:

如果这有帮助...

yarn application -kill <application_id>

【讨论】:

以上是关于如何使用 Yarn ResourceManager REST API 杀死 Spark 应用程序的主要内容,如果未能解决你的问题,请参考以下文章

深入YARN系列2:剖析ResourceManager的架构与组件使用

ResourceManager,YARN的资源管理器

yarn组建端口

YARN ResourceManager HA

Yarn Active ResourceManager启动框架分析

Yarn Yarn ResourceManager 重启机制