在 Python 中按共同日期加入时间序列(数据框和系列/列表问题)
Posted
技术标签:
【中文标题】在 Python 中按共同日期加入时间序列(数据框和系列/列表问题)【英文标题】:Joining time series by common date in Python (dataframe & series/list question) 【发布时间】:2020-07-23 21:41:45 【问题描述】:这里是菜鸟。请原谅我仍在学习的格式。我正在尝试创建一个包含三列的时间序列(我认为是数据框?)。一个是日期列,下一个是库存列,最后一个是价格列。
我已经提取了两个单独的系列(日期和库存;日期和价格),我想合并这两个系列,这样我就可以看到三列而不是两组两列。这是我的代码。
import json
import numpy as np
import pandas as pd
from urllib.error import URLError, HTTPError
from urllib.request import urlopen
class EIAgov(object):
def __init__(self, token, series):
'''
Purpose:
Initialise the EIAgov class by requesting:
- EIA token
- id code(s) of the series to be downloaded
Parameters:
- token: string
- series: string or list of strings
'''
self.token = token
self.series = series
def __repr__(self):
return str(self.series)
def Raw(self, ser):
# Construct url
url = 'http://api.eia.gov/series/?api_key=' + self.token + '&series_id=' + ser.upper()
try:
# URL request, URL opener, read content
response = urlopen(url);
raw_byte = response.read()
raw_string = str(raw_byte, 'utf-8-sig')
jso = json.loads(raw_string)
return jso
except HTTPError as e:
print('HTTP error type.')
print('Error code: ', e.code)
except URLError as e:
print('URL type error.')
print('Reason: ', e.reason)
def GetData(self):
# Deal with the date series
date_ = self.Raw(self.series[0])
date_series = date_['series'][0]['data']
endi = len(date_series) # or len(date_['series'][0]['data'])
date = []
for i in range (endi):
date.append(date_series[i][0])
# Create dataframe
df = pd.DataFrame(data=date)
df.columns = ['Date']
# Deal with data
lenj = len(self.series)
for j in range (lenj):
data_ = self.Raw(self.series[j])
data_series = data_['series'][0]['data']
data = []
endk = len(date_series)
for k in range (endk):
data.append(data_series[k][1])
df[self.series[j]] = data
return df
if __name__ == '__main__':
tok = 'mytoken'
# Natural Gas - Weekly Storage
#
ngstor = ['NG.NW2_EPG0_SWO_R48_BCF.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
stordata = EIAgov(tok, ngstor)
print(stordata.GetData())
# Natural Gas - Weekly Prices
#
ngpx = ['NG.RNGC1.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
pxdata = EIAgov(tok, ngpx)
print(pxdata.GetData())
请注意,“mytoken”需要替换为 eia.gov API 密钥。我可以让它成功地创建两个列表的输出...但是为了合并列表,我尝试在最后添加:
joined_frame = pd.concat([ngstor, ngpx], axis = 1, sort=False)
print(joined_frame.GetData())
但我得到一个错误
("TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid")
因为显然我不知道列表和系列之间的区别。
如何按日期列合并这些列表?非常感谢您的帮助。 (也请随意告诉我为什么我在这篇文章中正确格式化代码很糟糕。)
【问题讨论】:
【参考方案1】:如果您想在其余代码中将它们作为 DataFrame 进行操作,您可以将 ngstor
和 ngpx
转换为 DataFrame,如下所示:
import pandas as pd
# I create two lists that look like yours
ngstor = [[1,2], ["2020-04-03", "2020-05-07"]]
ngpx = [[3,4] , ["2020-04-03", "2020-05-07"]]
# I transform them to DataFrames
ngstor = pd.DataFrame("value1": ngstor[0],
"date_col": ngstor[1])
ngpx = pd.DataFrame("value2": ngpx[0],
"date_col": ngpx[1])
那么您可以使用pandas.merge
或pandas.concat
:
# merge option
joined_framed = pd.merge(ngstor, ngpx, on="date_col",
how="outer")
# concat option
ngstor = ngstor.set_index("date_col")
ngpx = ngpx.set_index("date_col")
joined_framed = pd.concat([ngstor, ngpx], axis=1,
join="outer").reset_index()
结果将是:
date_col value1 value2
0 2020-04-03 1 3
1 2020-05-07 2 4
【讨论】:
谢谢你,拉斐尔!你是我最好的新朋友。我调整了 concat 选项,它对我有用。非常感激!!抱歉,我不能投票,因为我是新人,所以我的票不算数。以上是关于在 Python 中按共同日期加入时间序列(数据框和系列/列表问题)的主要内容,如果未能解决你的问题,请参考以下文章