python pyspark-wordcount-ide.py

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python pyspark-wordcount-ide.py相关的知识,希望对你有一定的参考价值。

import sys
try:
    from pyspark import SparkConf, SparkContext

    conf = SparkConf().setAppName("Word Count").setMaster("local")
    sc = SparkContext(conf=conf)
    inputPath = sys.argv[1]
    outputPath = sys.argv[2]

    Path = sc._gateway.jvm.org.apache.hadoop.fs.Path
    FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
    Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration

    fs = FileSystem.get(Configuration())

    if(fs.exists(Path(inputPath)) == False):
        print("Input path does not exists")
    else:
        if(fs.exists(Path(outputPath))):
            fs.delete(Path(outputPath), True)
        sc.textFile(inputPath). \
            flatMap(lambda l: l.split(" ")). \
            map(lambda w: (w, 1)). \
            reduceByKey(lambda t, e: t + e). \
            saveAsTextFile(outputPath)

    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)
    sys.exit(1)

以上是关于python pyspark-wordcount-ide.py的主要内容,如果未能解决你的问题,请参考以下文章

Python代写,Python作业代写,代写Python,代做Python

Python开发

Python,python,python

Python 介绍

Python学习之认识python

python初识