R语言使用机器学习算法预测股票市场
Posted 龙心七号
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言使用机器学习算法预测股票市场相关的知识,希望对你有一定的参考价值。
quantmod 介绍
quantmod 是一个非常强大的金融分析报, 包含数据抓取,清洗,建模等等功能.
1. 获取数据 getSymbols
默认是数据源是yahoo
获取上交所股票为 getSymbols("600030.ss"), 深交所为 getSymbols("000002.sz"). ss表示上交所, sz表示深交所
2. 重命名函数 setSymbolLookup
3. 股息函数 getDividends
4. 除息调整函数 adjustOHLC
5. 除权除息函数 getSplits
6. 期权交易函数 getOptionChain
7. 财务报表 getFinancials / getFin
> library(quantmod) > setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo")) > getSymbols("WANKE") [1] "WANKE" Warning message: 000002.sz contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them. > head(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2008-03-17 14.221 14.221 14.221 13.65 2008-03-18 NA NA NA NA 2008-03-19 NA NA NA NA 2008-03-20 NA NA NA NA 2008-03-21 NA NA NA NA 2008-03-24 NA NA NA NA 000002.SZ.Volume 000002.SZ.Adjusted 2008-03-17 123340858 13.10156 2008-03-18 NA NA 2008-03-19 NA NA 2008-03-20 NA NA 2008-03-21 NA NA 2008-03-24 NA NA >
机器学习 Classification
首先, 简化问题, 只预测股票的涨跌情况. 问题就变成一个分类问题, 把历史数据分为涨跌两种情况. 进一不简化, 涨跌情况只与历史数据情况有关.
我们使用Naive Bayes classifier (朴素的贝叶斯分类) 作为学习方法. 朴素的贝叶斯的定义为: 给定类别A条件下,所有的属性Ai相互独立
R语言的实现如下
> library(lubridate) #日期包 > library(e1071) #朴素贝叶斯包 > library(quantmod) > setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo")) > getSymbols("WANKE") [1] "WANKE" > head(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2008-03-17 14.221 14.221 14.221 13.65 2008-03-18 NA NA NA NA 2008-03-19 NA NA NA NA 2008-03-20 NA NA NA NA 2008-03-21 NA NA NA NA 2008-03-24 NA NA NA NA 000002.SZ.Volume 000002.SZ.Adjusted 2008-03-17 123340858 13.10156 2008-03-18 NA NA 2008-03-19 NA NA 2008-03-20 NA NA 2008-03-21 NA NA 2008-03-24 NA NA > tail(WANKE) 000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close 2017-07-31 23.52 23.58 23.10 23.37 2017-08-01 23.35 23.55 23.20 23.42 2017-08-02 23.45 24.12 23.43 23.58 2017-08-03 23.58 23.58 22.79 23.11 2017-08-04 23.00 23.06 22.71 22.84 2017-08-07 22.82 23.05 22.68 22.71 000002.SZ.Volume 000002.SZ.Adjusted 2017-07-31 30942482 23.37 2017-08-01 20952262 23.42 2017-08-02 35391017 23.58 2017-08-03 45518939 23.11 2017-08-04 29612306 22.84 2017-08-07 23409149 22.71 > > startDate <- as.Date("2010-01-01") > endDate <- as.Date("2017-01-01") > DayofWeek <- wday(WANKE, label=TRUE) > PriceChange <- Cl(WANKE) - Op(WANKE) #收盘减去开盘 > Class <- ifelse(PriceChange > 0, "UP", "DOWN") #大于0就是涨 > DataSet <- data.frame(DayofWeek, Class) > MyModel <- naiveBayes(DataSet[,1], DataSet[,2]) > MyModel Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = DataSet[, 1], y = DataSet[, 2]) A-priori probabilities: DataSet[, 2] DOWN UP 0.5148148 0.4851852 Conditional probabilities: x DataSet[, 2] Sun Mon Tues Wed Thurs Fri DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331 UP 0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397 x DataSet[, 2] Sat DOWN 0.0000000 UP 0.0000000 >
整个dataset的涨跌概率
DataSet[, 2] DOWN UP 0.5148148 0.4851852
基于这个涨跌概率下, 每天的涨跌概率
Conditional probabilities: x DataSet[, 2] Sun Mon Tues Wed Thurs Fri DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331 UP 0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397 x DataSet[, 2] Sat DOWN 0.0000000 UP 0.0000000
模型改进
指数移动平均值 EMA (exponential moving average)
> W <- na.omit(WANKE) > DayofWeek <- wday(W, label=TRUE) > PriceChange <- Cl(W) - Op(W) > Class <- ifelse(PriceChange > 0, "UP", "DOWN") > EMA5 <- EMA(Op(W), n = 5) > EMA10 <- EMA(Op(W), n = 10) > EMACross <- EMA5 -EMA10 > EMACross <- round(EMACross, 2) > DataSet2 <- data.frame(DayofWeek, EMACross, Class) > DataSet2<-DataSet2[-c(1:10),] > head(DataSet2) DayofWeek EMA X000002.SZ.Close 2016-07-14 Thurs 0.11 DOWN 2016-07-15 Fri 0.04 DOWN 2016-07-18 Mon 0.00 DOWN 2016-07-19 Tues -0.10 DOWN 2016-07-20 Wed -0.23 DOWN 2016-07-21 Thurs -0.28 DOWN > tail(DataSet2) DayofWeek EMA X000002.SZ.Close 2017-07-31 Mon -0.34 DOWN 2017-08-01 Tues -0.31 UP 2017-08-02 Wed -0.26 UP 2017-08-03 Thurs -0.19 DOWN 2017-08-04 Fri -0.24 DOWN 2017-08-07 Mon -0.27 DOWN > length(DayofWeek) [1] 270 > TrainingSet<-DataSet2[1:200,] > TestSet<-DataSet2[201:270,] > EMACrossModel<-naiveBayes(TrainingSet[,1:2],TrainingSet[,3]) > EMACrossModel Naive Bayes Classifier for Discrete Predictors Call: naiveBayes.default(x = TrainingSet[, 1:2], y = TrainingSet[, 3]) A-priori probabilities: TrainingSet[, 3] DOWN UP 0.5 0.5 Conditional probabilities: DayofWeek TrainingSet[, 3] Sun Mon Tues Wed Thurs Fri Sat DOWN 0.00 0.22 0.13 0.24 0.18 0.23 0.00 UP 0.00 0.16 0.27 0.17 0.23 0.17 0.00 EMA TrainingSet[, 3] [,1] [,2] DOWN 0.0333 0.4119553 UP -0.0177 0.4191522 > table(predict(EMACrossModel,TestSet),TestSet[,3],dnn=list(‘predicted‘,‘actual‘)) actual predicted DOWN UP DOWN 16 21 UP 13 10 >
参考文献
quantmod
http://www.quantmod.com/,
https://github.com/dengyishuo/Notes/tree/master/quantmod
Naive Bayes classifier
http://blog.csdn.net/sulliy/article/details/6629201
Introduction to Use Machine Learning by R
https://www.inovancetech.com/blogML2.html
以上是关于R语言使用机器学习算法预测股票市场的主要内容,如果未能解决你的问题,请参考以下文章
R语言plotly可视化:使用plotly可视化简单线性回归模型的回归线对比不同参数设置下的同一机器学习模型算法的拟合曲线三维回归曲面可视化实际值和回归预测值的散点图残差分析
R语言使用DALEX包的explain函数生成指定分类预测机器学习模型的解释器