sparkstreaming在yarn运行

Posted 阿凯

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sparkstreaming在yarn运行相关的知识,希望对你有一定的参考价值。

sparkstreaming在yarn运行

  • idea
Maven->Lifecycle->package
  • 将jar包传入服务器

  • 执行spark-submit命令提交yarn

spark-submit \\
--class cn.ruige.data.genderalStat.gemeralStat.HistoryGenderTotal \\
--master yarn \\
--deploy-mode cluster \\
--queue default \\
--executor-memory 2g \\
--executor-cores 2  \\
--jars /opt/rely_jar/mysql-connector-java-5.1.38.jar ./datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar /opt/sparkstream_jar/config.properties historyGender groupGender
# --class 指定运行方法
# --master 提交任务到哪里执行
	yarn
	spark://<host>:<port>
	local
# --deploy-mode 启动模式
	client 本地启动
	cluster 集群模式
# --queue yarn上队列名称
# --executor-memory 每个executor的内存 默认1G
# --executor-cores CPU核数
# --jars 指定jar包,以逗号分隔
	本地文件 /opt/rely_jar/mysql-connector-java-5.1.38.jar
	也可以: hdfs:, http:, https:, ftp: executor直接从URL拉回文件
# ./datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar 为自己打包jar包,这里输入本地目录也可以上传指定hdfs
	hdfs://master:9000/user/spark/jars/datas_eagle-1.0-SNAPSHOT-jar-with-dependencies.jar
# 其他配置
--packages  包含在driver和executor的 classpath中的jar的maven坐标
	mysql:mysql-connector-java:5.1.38
	org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.8

常见报错

  • Exception in thread "main" java.lang.NoSuchMethodError: org.apa
在pom.xml中scala version必须与服务器scala版本一致才行

yarn常见操作

# 显示正在运行
yarn application -list
# 显示所有
yarn application -list -appStates ALL
# 列出app id 错误
yarn logs -applicationId [appid]
# 删除task
yarn application -kill [appid]

pom依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>datas_eagle</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <scala.version>2.11.12</scala.version>
        <spark.version>2.4.8</spark.version>
        <kafka.version>0.11.0.3</kafka.version>
<!--        <scala.binary.version>2.12.12</scala.binary.version>-->
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.4.0</version>
        </dependency>
<!--        <dependency>-->
<!--            <groupId>org.apache.kafka</groupId>-->
<!--            <artifactId>kafka-clients</artifactId>-->
<!--            <version>0.11.0.3</version>-->
<!--            <scope>provided</scope>-->
<!--        </dependency>-->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.2.4</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.38</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
    <pluginRepositories>
        <pluginRepository>
            <id>ali-plugin</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>
    <build>
        <plugins>
            <!-- 指定编译java的插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <!-- 指定编译scala的插件 -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <!-- Maven Assembly Plugin -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <!-- get all project dependencies -->
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <!-- MainClass in mainfest make a executable jar -->
                    <archive>
                        <manifest>
                            <!--<mainClass>util.Microseer</mainClass>-->
                        </manifest>
                    </archive>

                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <!-- bind to the packaging phase -->
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
<!--    <repositories>-->
<!--        <repository>-->
<!--            <id>maven-ali</id>-->
<!--            <url>http://maven.aliyun.com/nexus/content/groups/public//</url>-->
<!--            <releases>-->
<!--                <enabled>true</enabled>-->
<!--            </releases>-->
<!--            <snapshots>-->
<!--                <enabled>true</enabled>-->
<!--                <updatePolicy>always</updatePolicy>-->
<!--                <checksumPolicy>fail</checksumPolicy>-->
<!--            </snapshots>-->
<!--        </repository>-->
<!--    </repositories>-->
</project>

以上是关于sparkstreaming在yarn运行的主要内容,如果未能解决你的问题,请参考以下文章

小记--------sparkstreaming常驻yarn调度程序调优

美团的Hadoop YARN调度性能优化实践

分布式大数据系统概览(HDFS/MapReduce/Spark/Yarn/Zookeeper/Storm/SparkStreaming/Lambda/DataFlow/Flink/Giraph)

spark streaming怎么创建文档

美团1万台 Hadoop 集群 YARN 的调优之路

Yarn上常驻Spark-Streaming程序调优