python pyspark-wordcount-coalesce.py

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python pyspark-wordcount-coalesce.py相关的知识,希望对你有一定的参考价值。

# Make sure you do not have directory used for output path
path = "/Users/itversity/Research/data/wordcount.txt" or path = "/public/randomtextwriter/part-m-00000"

lines = sc.textFile(path)
lines_coalesce =   lines.coalesce(5) # with out coalesce it will try to use 9 tasks in first stage
words = lines.flatMap(lambda rec: rec.split(" "))
tuples = words.map(lambda rec: (rec, 1))
wordByCount = tuples.reduceByKey(lambda total, agg: total + agg)
wbcCoalesce = wordByCount.coalesce(2) # second stage will use only 2 tasks

for i in wbcCoalesce.take(100):
  print(i)

以上是关于python pyspark-wordcount-coalesce.py的主要内容,如果未能解决你的问题,请参考以下文章

Python代写,Python作业代写,代写Python,代做Python

Python开发

Python,python,python

Python 介绍

Python学习之认识python

python初识