python python-spark-daily-revenue.py
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python python-spark-daily-revenue.py相关的知识,希望对你有一定的参考价值。
# base directory of retail_db and output path are passed as arguments
# spark-submit daily_revenue.py /Users/itversity/Research/data/retail_db /Users/itversity/Research/revenue_per_day --master local
from pyspark import SparkContext, SparkConf
import sys
conf = SparkConf().setAppName("Daily Revenue").setMaster("local")
sc = SparkContext(conf=conf)
orders = sc.textFile(sys.argv[1] + "/orders")
ordersFiltered = orders.filter(lambda rec: rec.split(",")[3] == "COMPLETE" or rec.split(",")[3] == "CLOSED")
ordersFilteredMap = ordersFiltered.map(lambda rec: (int(rec.split(",")[0]), rec.split(",")[1]))
orderItems = sc.textFile(sys.argv[1] + "/order_items")
orderItemsMap = orderItems.map(lambda rec: (int(rec.split(",")[1]), float(rec.split(",")[4])))
ordersJoin = ordersFilteredMap.join(orderItemsMap)
ordersJoinMap = ordersJoin.map(lambda rec: rec[1])
ordersJoinMap = ordersJoin.map(lambda rec: rec[1])
orderRevenuePerDay = ordersJoinMap.reduceByKey(lambda agg, val: agg + val)
orderRevenuePerDay.\
map(lambda rec: rec[0] + "\t" + str(rec[1])).\
saveAsTextFile(sys.argv[2])
以上是关于python python-spark-daily-revenue.py的主要内容,如果未能解决你的问题,请参考以下文章
Python代写,Python作业代写,代写Python,代做Python
Python开发
Python,python,python
Python 介绍
Python学习之认识python
python初识