python python-spark-daily-revenue.py

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python python-spark-daily-revenue.py相关的知识,希望对你有一定的参考价值。

# base directory of retail_db and output path are passed as arguments
# spark-submit daily_revenue.py /Users/itversity/Research/data/retail_db /Users/itversity/Research/revenue_per_day --master local

from pyspark import SparkContext, SparkConf
import sys

conf = SparkConf().setAppName("Daily Revenue").setMaster("local")
sc = SparkContext(conf=conf)

orders = sc.textFile(sys.argv[1] + "/orders")
ordersFiltered = orders.filter(lambda rec: rec.split(",")[3] == "COMPLETE" or rec.split(",")[3] == "CLOSED")
ordersFilteredMap = ordersFiltered.map(lambda rec: (int(rec.split(",")[0]), rec.split(",")[1]))

orderItems = sc.textFile(sys.argv[1] + "/order_items")
orderItemsMap = orderItems.map(lambda rec: (int(rec.split(",")[1]), float(rec.split(",")[4])))
ordersJoin = ordersFilteredMap.join(orderItemsMap)
ordersJoinMap = ordersJoin.map(lambda rec: rec[1])

ordersJoinMap = ordersJoin.map(lambda rec: rec[1])
orderRevenuePerDay = ordersJoinMap.reduceByKey(lambda agg, val: agg + val)

orderRevenuePerDay.\
map(lambda rec: rec[0] + "\t" + str(rec[1])).\
saveAsTextFile(sys.argv[2])

以上是关于python python-spark-daily-revenue.py的主要内容,如果未能解决你的问题,请参考以下文章

Python代写,Python作业代写,代写Python,代做Python

Python开发

Python,python,python

Python 介绍

Python学习之认识python

python初识