投资因子(Investment factor)——投资组合分析(EAP.portfolio_analysis)

Posted 鹦鹉螺平原

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了投资因子(Investment factor)——投资组合分析(EAP.portfolio_analysis)相关的知识,希望对你有一定的参考价值。

实证资产定价(Empirical asset pricing)已经发布于Github. 包的具体用法(Documentation)博主将会陆续在CSDN中详细介绍。

Github: GitHub - whyecofiliter/EAP: empirical asset pricing

自Fama and French (2015)引入以来,投资因素逐渐流行起来。它还包括在HXZ的模型(2015)中,Zhang (2017) 将其扩展到ICAPM。在其流行之前,Titman et al. (2014)是该因素的早期研究者之一,他们使用异常资本投资作为代理变量。然而,在随后的文献中,大多数研究使用资产增长率作为代理变量,包括Fama and French (2015) 和Hou et al. (2015)。在发达国家的市场中,投资因素与未来收益呈负相关,而在大多数发展中国家,这种关系更为密切。在中国市场,大多数文献都不存在显著的投资效应(Guo et al., 2017; Qiao, 2019; Liu et al., 2019)。

在这个demo中,年度资产增长率被用作盈利能力因子的代理变量,盈利能力因子是根据财务数据和衍生工具比率计算出来的。数据集始于2004年1月,从CSMAR数据集中收集。警告:请勿将此演示中的数据集用于任何商业目的。

# %% import package
from numpy import dtype
import pandas as pd
import sys, os

sys.path.append(os.path.abspath(".."))

# %% import data
# Monthly return of stocks in China security market
month_return = pd.read_hdf('.\\data\\month_return.h5', key='month_return')
company_data = pd.read_hdf('.\\data\\last_filter_pe.h5', key='data')

对数据进行一些预处理。

# %% preprocessing data
# forward the monthly return for each stock
# emrwd is the return including dividend
month_return['emrwd'] = month_return.groupby(['Stkcd'])['Mretwd'].shift(-1)
# emrnd is the return including no dividend
month_return['emrnd'] = month_return.groupby(['Stkcd'])['Mretnd'].shift(-1)
# select the A share stock
month_return = month_return[month_return['Markettype'].isin([1, 4, 16])]

# % distinguish the stocks whose size is among the up 30% stocks in each month
def percentile(stocks) :
    return stocks >= stocks.quantile(q=.3)

month_return['cap'] = month_return.groupby(['Trdmnt'])['Msmvttl'].apply(percentile)

年度资产增长率被用作盈利能力系数的代理变量,数据由财务数据和衍生财务比率计算得出。

# %% calculate the total asset
# asset = debt + equity
# debt = company_value - market_value
# equity = market_value / PB
company_data['debt'] = company_data['EV1'] - company_data['MarketValue']
company_data['equity'] = company_data['MarketValue']/company_data['PBV1A']
company_data['asset'] = company_data['debt'] + company_data['equity']

# asset growth rate
company_data['asset_growth_rate'] = company_data['asset'].groupby(['Symbol']).diff(12)/company_data['asset']

进一步数据预处理。

# %% prepare merge data
from pandas.tseries.offsets import *

month_return['Stkcd_merge'] = month_return['Stkcd'].astype(dtype='string')
month_return['Date_merge'] = pd.to_datetime(month_return['Trdmnt'])
#month_return['Yearmonth'] = month_return['Date_merge'].map(lambda x : 1000*x.year + x.month)
#month_return['Date_merge'] += MonthEnd()

company_data['Stkcd_merge'] = company_data['Symbol'].dropna().astype(dtype='int').astype(dtype='string')
company_data['Date_merge'] = pd.to_datetime(company_data['TradingDate'])
#company_data['Yearmonth'] = company_data['Date_merge'].map(lambda x : 1000*x.year + x.month)
company_data['Date_merge'] += MonthBegin()

# %% dataset starts from '2000-01'
company_data = company_data[company_data['Date_merge'] >= '2000-01']
month_return = month_return[month_return['Date_merge'] >= '2000-01']
return_company = pd.merge(company_data, month_return, on=['Stkcd_merge', 'Date_merge'])

构成了两个数据集。一个包括尾部30%的股票,而另一个不包括尾部30%的股票。附单变量分析和双变量分析。

# %% construct test_data for bivariate analysis
# dataset 1 : no tail stocks & ROE Bivariate
from portfolio_analysis import Bivariate, Univariate
import numpy as np

# select stocks whose size is among the up 30% stocks in each month and whose trading 
# days are more than or equal to 10 days
test_data_1 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)]
test_data_1 = test_data_1[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna()
test_data_1 = test_data_1[(test_data_1['Date_merge'] >= '2004-01-01') & (test_data_1['Date_merge'] <= '2019-12-01')]

# Univariate analysis
uni_1 = Univariate(np.array(test_data_1[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9)
uni_1.summary_and_test()
uni_1.print_summary_by_time()
uni_1.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
|  Group  |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |   10  |  Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.011 | 0.012 | 0.013 | 0.013 | 0.015 | 0.014 | 0.013 | 0.015 | 0.015 | 0.016 | 0.005 |
|  T-Test | 1.393 | 1.655 | 1.783 | 1.879 | 2.054 | 1.985 | 1.955 | 2.162 | 2.064 | 2.152 | 1.907 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================

# Bivariate analysis
bi_1 = Bivariate(np.array(test_data_1), number=4)
bi_1.average_by_time()
bi_1.summary_and_test()
bi_1.print_summary_by_time()
bi_1.print_summary()
==============================================================
+-------+--------+--------+--------+--------+--------+-------+
| Group |   1    |   2    |   3    |   4    |   5    |  Diff |
+-------+--------+--------+--------+--------+--------+-------+
|   1   | 0.015  | 0.017  | 0.018  | 0.018  |  0.02  | 0.005 |
|       | 1.848  | 2.119  | 2.336  | 2.404  | 2.482  | 1.985 |
|   2   | 0.012  | 0.014  | 0.017  | 0.015  | 0.019  | 0.007 |
|       | 1.509  | 1.784  | 2.301  | 1.984  | 2.434  |  2.8  |
|   3   |  0.01  | 0.012  | 0.015  | 0.014  | 0.014  | 0.004 |
|       | 1.314  | 1.695  | 2.026  | 1.884  | 1.912  | 1.862 |
|   4   | 0.009  |  0.01  | 0.011  | 0.013  | 0.015  | 0.006 |
|       | 1.194  | 1.507  | 1.579  | 1.831  | 2.009  |  2.45 |
|   5   | 0.007  |  0.01  | 0.011  | 0.014  | 0.012  | 0.005 |
|       |  1.03  | 1.517  | 1.685  | 2.106  | 1.749  |  1.7  |
|  Diff | -0.008 | -0.007 | -0.008 | -0.005 | -0.007 |  0.0  |
|       | -1.902 | -1.646 | -1.897 | -1.213 | -1.771 | 0.088 |
+-------+--------+--------+--------+--------+--------+-------+
==============================================================

数据集#1的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。

# %% construct test_data for bivariate analysis
# dataset 2 : tail stocks & ROE Bivariate  
from portfolio_analysis import Bivariate, Univariate
import numpy as np

# select stocks whose size is among the up 30% stocks in each month and whose trading 
# days are more than or equal to 10 days
test_data_2 = return_company[return_company['Ndaytrd']>=10]
test_data_2 = test_data_2[['emrwd', 'Msmvttl', 'asset_growth_rate', 'Date_merge']].dropna()
test_data_2 = test_data_2[(test_data_2['Date_merge'] >= '2004-01-01') & (test_data_2['Date_merge'] <= '2019-12-01')]

# Univariate analysis
uni_2 = Univariate(np.array(test_data_2[['emrwd', 'asset_growth_rate', 'Date_merge']]), number=9)
uni_2.summary_and_test()
uni_2.print_summary_by_time()
uni_2.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
|  Group  |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |   10  |  Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.017 | 0.016 | 0.017 | 0.017 | 0.018 | 0.001 |
|  T-Test | 2.052 | 2.204 | 2.301 | 2.303 | 2.323 |  2.33 | 2.249 | 2.392 | 2.283 | 2.411 | 0.313 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================

# Bivariate analysis
bi_2 = Bivariate(np.array(test_data_2), number=4)
bi_2.average_by_time()
bi_2.summary_and_test()
bi_2.print_summary_by_time()
bi_2.print_summary()
===============================================================
+-------+--------+--------+--------+--------+--------+-------+
| Group |   1    |   2    |   3    |   4    |   5    |  Diff |
+-------+--------+--------+--------+--------+--------+-------+
|   1   | 0.027  | 0.026  | 0.027  | 0.027  | 0.027  |  0.0  |
|       | 3.113  |  3.25  | 3.257  | 3.312  | 3.372  | 0.079 |
|   2   | 0.015  | 0.019  |  0.02  | 0.021  | 0.021  | 0.006 |
|       | 1.885  | 2.331  | 2.482  | 2.706  | 2.674  | 2.551 |
|   3   | 0.012  | 0.014  | 0.017  | 0.015  | 0.017  | 0.005 |
|       | 1.561  | 1.788  | 2.198  | 2.067  | 2.286  | 2.264 |
|   4   | 0.009  |  0.01  | 0.013  | 0.013  | 0.015  | 0.005 |
|       | 1.271  | 1.475  | 1.745  | 1.888  | 1.999  | 2.397 |
|   5   | 0.007  | 0.011  |  0.01  | 0.012  | 0.013  | 0.006 |
|       | 0.987  | 1.729  | 1.582  | 1.882  |  1.83  |  2.2  |
|  Diff | -0.02  | -0.015 | -0.017 | -0.014 | -0.014 | 0.006 |
|       | -4.431 | -3.522 | -3.695 | -3.205 | -3.197 | 1.813 |
+-------+--------+--------+--------+--------+--------+-------+
===============================================================

数据集#2的结果与文献一致,即在单变量分析中,由于t值低于2.3,差异收益不显著,而在双变量分析中,由于t值低于2.3,差异收益在很大程度上不显著,这表明投资因子不提供超额收益。

以上是关于投资因子(Investment factor)——投资组合分析(EAP.portfolio_analysis)的主要内容,如果未能解决你的问题,请参考以下文章

Kafka动态调整topic副本因子replication-factor

质因子分解——Prime Factors

R将字符串类型(Character)转化为因子类型(Factor)

R将因子类型(Factor)转化为字符串类型(Character)

因子factor(),str()

R语言droplevels函数删除因子变量(factor)没有用到的级别(level)实战