PySpark:计算指数移动平均线
Posted
技术标签:
【中文标题】PySpark:计算指数移动平均线【英文标题】:PySpark: Calculate Exponential Moving Average 【发布时间】:2020-07-03 15:36:50 【问题描述】:我想计算 PySpark 中 Close 的第 9 单元的指数移动平均值
以下是我的数据
+---------+-------------------+--------+--------+--------+--------+
| Symbol| DateTime| Open| High| Low| Close|
+---------+-------------------+--------+--------+--------+--------+
|BANKNIFTY|2019-01-01 09:15:00| 27235.5|27239.55| 27087.7| 27119.8|
|BANKNIFTY|2019-01-01 09:30:00| 27120.3| 27123.3|27073.85| 27103.6|
|BANKNIFTY|2019-01-01 09:45:00| 27104.7| 27119.1|27035.25| 27052.7|
|BANKNIFTY|2019-01-01 10:00:00|27051.65|27078.35| 27038.8| 27070.7|
|BANKNIFTY|2019-01-01 10:15:00| 27071|27092.85| 27061.3| 27076.9|
|BANKNIFTY|2019-01-01 10:30:00| 27078.2| 27099.1|27064.95| 27079.1|
|BANKNIFTY|2019-01-01 10:45:00|27077.85| 27094.5|27057.45|27085.65|
|BANKNIFTY|2019-01-01 11:00:00|27082.85| 27116.5|27082.35| 27112.7|
|BANKNIFTY|2019-01-01 11:15:00|27111.75| 27119.1| 27081.6|27083.35|
|BANKNIFTY|2019-01-01 11:30:00|27084.25|27106.65|27080.65| 27099.3|
|BANKNIFTY|2019-01-01 11:45:00| 27100.5| 27114| 27095.3|27109.15|
|BANKNIFTY|2019-01-01 12:00:00|27108.95|27122.85|27105.55| 27111.9|
|BANKNIFTY|2019-01-01 12:15:00| 27114.4| 27121.1|27086.55| 27087.8|
|BANKNIFTY|2019-01-01 12:30:00|27087.95|27088.45| 27060.4| 27074.3|
|BANKNIFTY|2019-01-01 12:45:00| 27072.8| 27081.7|27054.75| 27062.2|
|BANKNIFTY|2019-01-01 13:00:00|27062.35|27094.55| 27059.5|27093.15|
|BANKNIFTY|2019-01-01 13:15:00|27093.45|27094.85|27075.65|27085.05|
|BANKNIFTY|2019-01-01 13:30:00| 27084.8|27087.15|27019.05| 27043.2|
|BANKNIFTY|2019-01-01 13:45:00|27045.75|27068.85|27035.85|27062.35|
|BANKNIFTY|2019-01-01 14:00:00|27062.45|27100.35| 27057|27080.65|
|BANKNIFTY|2019-01-01 14:15:00|27081.05|27188.25| 27076.6|27188.25|
|BANKNIFTY|2019-01-01 14:30:00|27186.85|27309.45| 27179.8|27294.15|
|BANKNIFTY|2019-01-01 14:45:00| 27294.1|27356.85| 27294.1|27346.35|
|BANKNIFTY|2019-01-01 15:00:00| 27346.1|27395.35|27326.05| 27386.8|
|BANKNIFTY|2019-01-01 15:15:00| 27385|27430.55| 27380|27411.15|
+---------+-------------------+--------+--------+--------+--------+
想要添加计算最近 9 个收盘价周期的指数移动平均线的 EMA 列
EMA 计算公式
Multiplier = 2 ÷ (number of time periods + 1) => 2 ÷ (9+ 1) => 2 ÷ 10 => 0.2
EMA: Price - EMA(previous row) x Multiplier + EMA(previous row)
期望的输出
+---------+----------------+--------+--------+--------+--------+--------+
| Symbol| DateTime| Open| High| Low| Close| EMA|
+---------+----------------+--------+--------+--------+--------+--------+
|BANKNIFTY|01-01-2019 09:15| 27235.5|27239.55| 27087.7| 27119.8| |
|BANKNIFTY|01-01-2019 09:30| 27120.3| 27123.3|27073.85| 27103.6| |
|BANKNIFTY|01-01-2019 09:45| 27104.7| 27119.1|27035.25| 27052.7| |
|BANKNIFTY|01-01-2019 10:00|27051.65|27078.35| 27038.8| 27070.7| |
|BANKNIFTY|01-01-2019 10:15| 27071|27092.85| 27061.3| 27076.9| |
|BANKNIFTY|01-01-2019 10:30| 27078.2| 27099.1|27064.95| 27079.1| |
|BANKNIFTY|01-01-2019 10:45|27077.85| 27094.5|27057.45|27085.65| |
|BANKNIFTY|01-01-2019 11:00|27082.85| 27116.5|27082.35| 27112.7| |
|BANKNIFTY|01-01-2019 11:15|27111.75| 27119.1| 27081.6|27083.35|27087.17|
|BANKNIFTY|01-01-2019 11:30|27084.25|27106.65|27080.65| 27099.3|27089.59|
|BANKNIFTY|01-01-2019 11:45| 27100.5| 27114| 27095.3|27109.15| 27093.5|
|BANKNIFTY|01-01-2019 12:00|27108.95|27122.85|27105.55| 27111.9|27097.18|
|BANKNIFTY|01-01-2019 12:15| 27114.4| 27121.1|27086.55| 27087.8|27095.31|
|BANKNIFTY|01-01-2019 12:30|27087.95|27088.45| 27060.4| 27074.3|27091.11|
|BANKNIFTY|01-01-2019 12:45| 27072.8| 27081.7|27054.75| 27062.2|27085.32|
|BANKNIFTY|01-01-2019 13:00|27062.35|27094.55| 27059.5|27093.15|27086.89|
|BANKNIFTY|01-01-2019 13:15|27093.45|27094.85|27075.65|27085.05|27086.52|
|BANKNIFTY|01-01-2019 13:30| 27084.8|27087.15|27019.05| 27043.2|27077.86|
|BANKNIFTY|01-01-2019 13:45|27045.75|27068.85|27035.85|27062.35|27074.76|
|BANKNIFTY|01-01-2019 14:00|27062.45|27100.35| 27057|27080.65|27075.93|
|BANKNIFTY|01-01-2019 14:15|27081.05|27188.25| 27076.6|27188.25| 27098.4|
|BANKNIFTY|01-01-2019 14:30|27186.85|27309.45| 27179.8|27294.15|27137.55|
|BANKNIFTY|01-01-2019 14:45| 27294.1|27356.85| 27294.1|27346.35|27179.31|
|BANKNIFTY|01-01-2019 15:00| 27346.1|27395.35|27326.05| 27386.8|27220.81|
|BANKNIFTY|01-01-2019 15:15| 27385|27430.55| 27380|27411.15|27258.88|
+---------+-------------------+--------+--------+--------+--------+------+
公式链接:https://sciencing.com/calculate-exponential-moving-averages-8221813.html
【问题讨论】:
检查这个 - ***.com/a/54489659/4758823 【参考方案1】:您的问题不是很清楚,因为您的测试数据集不包括多天,所以您的 EMA 不能使用示例中数据中的前几天。
但是,要创建您需要的数据集:
from pyspark.sql import SQLContext, Window
from pyspark.sql import functions as F
from pyspark.sql.types import FloatType, StringType, StructField, StructType, TimestampType
schema = StructType(
[
StructField("Symbol", StringType(), nullable=False),
StructField("DateTime", TimestampType(), nullable=False),
StructField("Open", FloatType(), nullable=False),
StructField("High", FloatType(), nullable=False),
StructField("Low", FloatType(), nullable=False),
StructField("Close", FloatType(), nullable=False),
StructField("Close", FloatType(), nullable=False),
]
)
data = [
("BANKNIFTY",2019-01-01 09:15:00, 27235.5,27239.55, 27087.7, 27119.8),
("BANKNIFTY",2019-01-01 09:30:00, 27120.3, 27123.3,27073.85, 27103.6),
("BANKNIFTY",2019-01-01 09:45:00, 27104.7, 27119.1,27035.25, 27052.7),
...
]
sql_context = SQLContext(sc)
df = sql_context.createDataFrame(data, schema)
从那里,您只需要创建窗口
w = Window().partitionBy("Close").orderBy(F.col("DateTime"))
然后使用窗口创建新列
df = df.withColumn(
"EMA",
F.avg("Close").over(w)
)
注意:这里我只是使用平均值,而不是您要求的 EMA。要获得 EMA,我会添加一个额外的列来计算每天的 EMA,然后在计算中使用它。
【讨论】:
抱歉造成误会。前一天表示前一行的 EMA 值。我只需要根据标准 EMA 计算找到 EMA 值。此外,如果我发现 EMA 值为 9,那么前 8 行将没有任何 EMA 值。从第 9 行开始,我们将获得 EMA 值。 感谢您的澄清。在这种情况下,您需要的是lag
函数。例如,F.lag(F.col("Flag"), 9)
其中 9 是之前的行数
能否请您更新您的 EMA 计算答案,您分享的答案是简单 MA 计算。以上是关于PySpark:计算指数移动平均线的主要内容,如果未能解决你的问题,请参考以下文章