遍历嵌套字典以创建数据框并添加新的列值
Posted
技术标签:
【中文标题】遍历嵌套字典以创建数据框并添加新的列值【英文标题】:Iterate Through Nested Dictionary to Create Dataframe and Add New Column Value 【发布时间】:2021-05-04 23:25:21 【问题描述】:Python 菜鸟,请多多包涵。
我有一个股票信息字典列表。变量名称“json”,我想将其转换为数据框,然后在数据旁边的新列中附加一个带有股票代码的列。见下文。
json =
['Meta Data': '1. Information': 'Monthly Prices (open, high, low, close) and Volumes', '2.
Symbol': 'AAPL', '3. Last Refreshed': '2021-01-29', '4. Time Zone': 'US/Eastern', 'Monthly Time
Series': '2021-01-29': '1. open': '133.5200', '2. high': '145.0900', '3. low': '126.3820', '4.
close': '131.9600', '5. volume': '2239366098', '2020-12-31': '1. open': '121.0100', '2. high':
'138.7890', '3. low': '120.0100', '4. close': '132.6900', '5. volume': '2319687808',
'Meta Data': '1. Information': 'Monthly Prices (open, high, low, close) and Volumes', '2.
Symbol': 'ZM', '3. Last Refreshed': '2021-01-29', '4. Time Zone': 'US/Eastern', 'Monthly Time
Series': '2021-01-29': '1. open': '340.4000', '2. high': '404.4400', '3. low': '331.1000', '4.
close': '372.0700', '5. volume': '121350349', '2020-12-31': '1. open': '434.7200', '2. high':
'434.9900', '3. low': '336.1000', '4. close': '337.3200', '5. volume': '150168985']
我运行以下命令,得到我想要的数据框,除了代码:
df = [pd.DataFrame.from_dict(i['Monthly Time Series'], orient= 'index').sort_index(axis=1) for i in json]
输出:
[ 1. open 2. high 3. low 4. close 5. volume
2021-01-29 133.5200 145.0900 126.3820 131.9600 2239366098
2020-12-31 121.0100 138.7890 120.0100 132.6900 2319687808
],
1. open 2. high 3. low 4. close 5. volume
2021-01-29 340.4000 404.4400 331.1000 372.0700 121350349
2020-12-31 434.7200 434.9900 336.1000 337.3200 150168985]
我想要的是从'2中提取值。 Symbol' 来自 json 并将相应的股票代码附加到相应的数据中,如下所示:
[ 1. open 2. high 3. low 4. close 5. volume ticker
2021-01-29 133.5200 145.0900 126.3820 131.9600 2239366098 AAPL
2020-12-31 121.0100 138.7890 120.0100 132.6900 2319687808 AAPL
],
1. open 2. high 3. low 4. close 5. volume ticker
2021-01-29 340.4000 404.4400 331.1000 372.0700 121350349 ZM
2020-12-31 434.7200 434.9900 336.1000 337.3200 150168985 ZM
]
【问题讨论】:
首先,json 不是字典。请在继续之前在type
上确认
谢谢。那我该怎么办?
如果能确认类型就好了。
【参考方案1】:
更新:
单循环一行执行
df = [ (pd.DataFrame.from_dict(i['Monthly Time Series'] , orient= 'index').sort_index(axis=1).assign(ticker=i['Meta Data']['2.Symbol'])) for i in json]
json 数据:
json =[
'Meta Data':
'1. Information': 'Monthly Prices (open, high, low, close) and Volumes','2.Symbol': 'AAPL', '3. Last Refreshed': '2021-01-29', '4. Time Zone': 'US/Eastern',
'Monthly Time Series':
'2020-01-29':
'1. open': '133.5200', '2. high': '145.0900','3. low': '126.3820', '4. close': '131.9600', '5. volume': '2239366098'
,
'2020-01-30': '1. open': '121.0100', '2. high': '138.7890', '3. low': '120.0100',
'4. close': '132.6900', '5. volume': '2319687808'
,
'Meta Data':
'1. Information': 'Monthly Prices (open, high, low, close) and Volumes','2.Symbol': 'ZM', '3. Last Refreshed': '2021-01-01', '4. Time Zone': 'US/Eastern',
'Monthly Time Series':
'2020-02-02': '1. open': '133.5200', '2. high': '145.0900','3. low': '126.3820',
'4. close' : '131.9600', '5. volume': '2239366098'
,
'2020-02-31':
'1. open': '121.0100', '2. high': '138.7890', '3. low': '120.0100','4. close' : '132.6900', '5. volume': '2319687808'
]
利用 assign 添加新列
addTimeSeries = lambda i : pd.DataFrame.from_dict(i['Monthly Time Series'] , orient= 'index').sort_index(axis=1)
addVal = lambda i: addTimeSeries(i).assign(ticker=i['Meta Data']['2.Symbol'])
df = [ addVal(i) for i in json]
输出:
[ 1. open 2. high 3. low 4. close 5. volume ticker
2020-01-29 133.5200 145.0900 126.3820 131.9600 2239366098 AAPL
2020-01-30 121.0100 138.7890 120.0100 132.6900 2319687808 AAPL,
1. open 2. high 3. low 4. close 5. volume ticker
2020-02-02 133.5200 145.0900 126.3820 131.9600 2239366098 ZM
2020-02-31 121.0100 138.7890 120.0100 132.6900 2319687808 ZM]
【讨论】:
谢谢。不幸的是仍然无法正常工作。我现在收到响应:“TypeError:字符串索引必须是整数” 不知道你是如何得到那个错误 bcz 运行上面的代码我没有得到任何这样的错误。 在一行中更新了答案而不使用 lambda。以上是关于遍历嵌套字典以创建数据框并添加新的列值的主要内容,如果未能解决你的问题,请参考以下文章