创建步骤火花python,亚马逊hadoop

Posted

技术标签:

【中文标题】创建步骤火花python,亚马逊hadoop【英文标题】:create step spark python, amazon hadoop 【发布时间】:2016-08-25 07:21:58 【问题描述】:

我正在亚马逊上使用 Hadoop 创建 Spark 步骤,但我一直在思考。不是因为我的代码不好或判断错误,但找不到出路。

我通过密码

spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 1g s3://URL-S3/scripts/test.py

脚本:

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TestSpark')

table.put_item(
   Item=
        'app_token': "1a",
        'advertising_id': "1b",
    
)

我一直都回来

16/08/25 07:06:22 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:23 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:24 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:25 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:26 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:27 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:28 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:29 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:30 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:31 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:32 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:33 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:34 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:35 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:36 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:37 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:38 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:39 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:40 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:41 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:42 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)

错误日志:

2016-08-25T07:30:14.769Z INFO Step created jobs: 
2016-08-25T07:30:14.769Z WARN Step failed with exitCode 1 and took 1062 seconds

谢谢!

这已经是错误了,但是模块并安装之前。

ImportError: 没有名为 boto3 的模块

【问题讨论】:

你读过docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/…吗?我从未使用过 Spark 步骤,但在浏览文档后,似乎不是您来spark-submit,而是服务。 【参考方案1】:

您的应用程序正在等待纱线资源。转到资源管理器 URL 并查看您是否有足够的资源并使用正确的队列。看看yarn resourcemanager的日志就知道原因了。

【讨论】:

显示此错误:2016-08-25T07:30:14.769Z INFO Step created jobs: 2016-08-25T07:30:14.769Z WARN Step failed with exitCode 1 and take 1062 seconds【参考方案2】:

我不在 Amazon EMR 上工作,但在 Hadoop 中,当您的 YARN 等待资源的时间过长时,就会发生这种情况。

资源协商器无法分配所需的资源,请尝试减少您的代码所需的资源。还要检查日志。

通读:this

还要检查 YARN 的状态,

sudo service hadoop-yarn-nodemanager status
sudo service hadoop-yarn-resourcemanager status

【讨论】:

【参考方案3】:

并定位错误。

没有安装 Boto3 模块,从控制台安装它,但这些步骤不起作用,因为他们必须在所有实例中安装它。所以我所做的是创建另一个运行 boostrap-action 更新 python 的类我安装了模块 boto3

【讨论】:

以上是关于创建步骤火花python,亚马逊hadoop的主要内容,如果未能解决你的问题,请参考以下文章

MRJob 极速入门,Python玩转Hadoop你会么?

无法从火花连接到红移

使用Nginx+uwsgi在亚马逊云服务器上部署python+django项目完整版——云服务器申请及配置

使用亚马逊弹性mapreduce服务时如何在hadoop中包含第三方库

Python 包装器“python-amazon-simple-product-api”用于在亚马逊上创建新购物车

创建亚马逊灵活支付签名