Python,将 Json 标准化为数据框
Posted
技术标签:
【中文标题】Python,将 Json 标准化为数据框【英文标题】:Python, Normalized Json to Dataframe 【发布时间】:2021-05-28 10:51:48 【问题描述】:我首先尝试对数据进行归一化:
df = pd.json_normalize(balance_sheet_data_qt)
然后我尝试使用这个答案将其展平,How to flatten a pandas dataframe with some columns as json? 但似乎没有做任何事情。
json_struct = json.loads(df .to_json(orient="records"))
#df_flat = pd.json_normalize(json_struct)
也尝试了这个How to read and normalize following json in pandas?,但在哪些列上使用assign
时遇到了问题。
在标准化 aka balanace_sheet_data_qt 之前采样数据
'balanceSheetHistoryQuarterly': 'AAPL': ['2020-12-26': 'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000, 'commonStock': 51744000000, 'otherCurrentAssets': 13687000000, 'retainedEarnings': 14301000000, 'otherLiab': 56042000000, 'treasuryStock': 179000000, 'otherAssets': 43270000000, 'cash': 36010000000, 'totalCurrentLiabilities': 132507000000, 'shortLongTermDebt': 7762000000, 'otherStockholderEquity': 179000000, 'propertyPlantEquipment': 37933000000, 'totalCurrentAssets': 154106000000, 'longTermInvestments': 118745000000, 'netTangibleAssets': 66224000000, 'shortTermInvestments': 40816000000, 'netReceivables': 58620000000, 'longTermDebt': 99281000000, 'inventory': 4973000000, 'accountsPayable': 63846000000, '2020-09-26': 'totalLiab': 258549000000, 'totalStockholderEquity': 65339000000, 'otherCurrentLiab': 47867000000, 'totalAssets': 323888000000, 'commonStock': 50779000000, 'otherCurrentAssets': 11264000000, 'retainedEarnings': 14966000000, 'otherLiab': 46108000000, 'treasuryStock': -406000000, 'otherAssets': 33952000000, 'cash': 38016000000, 'totalCurrentLiabilities': 105392000000, 'shortLongTermDebt': 8773000000, 'otherStockholderEquity': -406000000, 'propertyPlantEquipment': 45336000000, 'totalCurrentAssets': 143713000000, 'longTermInvestments': 100887000000, 'netTangibleAssets': 65339000000, 'shortTermInvestments': 52927000000, 'netReceivables': 37445000000, 'longTermDebt': 98667000000, 'inventory': 4061000000, 'accountsPayable': 42296000000, '2020-06-27': 'totalLiab': 245062000000, 'totalStockholderEquity': 72282000000, 'otherCurrentLiab': 39945000000, 'totalAssets': 317344000000, 'commonStock': 48696000000, 'otherCurrentAssets': 10987000000, 'retainedEarnings': 24136000000, 'otherLiab': 47606000000, 'treasuryStock': -550000000, 'otherAssets': 32836000000, 'cash': 33383000000, 'totalCurrentLiabilities': 95318000000, 'shortLongTermDebt': 7509000000, 'otherStockholderEquity': -550000000, 'propertyPlantEquipment': 43851000000, 'totalCurrentAssets': 140065000000, 'longTermInvestments': 100592000000, 'netTangibleAssets': 72282000000, 'shortTermInvestments': 59642000000, 'netReceivables': 32075000000, 'longTermDebt': 94048000000, 'inventory': 3978000000, 'accountsPayable': 35325000000, '2020-03-28': 'totalLiab': 241975000000, 'totalStockholderEquity': 78425000000, 'otherCurrentLiab': 42048000000, 'totalAssets': 320400000000, 'commonStock': 48032000000, 'otherCurrentAssets': 15691000000, 'retainedEarnings': 33182000000, 'otherLiab': 48745000000, 'treasuryStock': -2789000000, 'otherAssets': 33868000000, 'cash': 40174000000, 'totalCurrentLiabilities': 96094000000, 'shortLongTermDebt': 10392000000, 'otherStockholderEquity': -2789000000, 'propertyPlantEquipment': 43986000000, 'totalCurrentAssets': 143753000000, 'longTermInvestments': 98793000000, 'netTangibleAssets': 78425000000, 'shortTermInvestments': 53877000000, 'netReceivables': 30677000000, 'longTermDebt': 89086000000, 'inventory': 3334000000, 'accountsPayable': 32421000000]
标准化后的样本数据。
,balanceSheetHistoryQuarterly.AAPL
0,"['2020-12-26': 'totalLiab': 287830000000, 'totalStockholderEquity': 66224000000, 'otherCurrentLiab': 55899000000, 'totalAssets': 354054000000,
我想要的列列表:
'totalLiab'
'totalStockholderEquity'
'otherCurrentLiab'
'totalAssets'
'commonStock'
'otherCurrentAssets'
'retainedEarnings'
'otherLiab'
'treasuryStock'
'otherAssets'
'cash'
'totalCurrentLiabilities'
'shortLongTermDebt'
'otherStockholderEquity'
'propertyPlantEquipment'
'totalCurrentAssets'
'propertyPlantEquipment'
'totalCurrentAssets'
'longTermInvestments'
'netTangibleAssets'
'shortTermInvestments'
'netReceivables'
'longTermDebt'
'inventory'
'accountsPayable'
我正在尝试将其转换为数据框/表格格式。我认为第一行 balanceSheetHistoryQuarterly.AAPL
或日期列可能会把它扔掉。
感谢任何帮助。
【问题讨论】:
能否请您在规范化之前发布示例数据? 好的,我已经添加了 我认为您在此问题中的示例数据已损坏,它不是有效的 json。你能提供balance_sheet_data_qt
和你想要的列吗?
好的,我添加了完整的 balance_sheet_data_qt 和我想要的列。
【参考方案1】:
由于数据结构如下所示
'balanceSheetHistoryQuarterly':
'AAPL': [
'2020-12-26':
'totalLiab': 287830000000,
'totalStockholderEquity': 66224000000,
'otherCurrentLiab': 55899000000
,
'2020-06-27':
'totalLiab': 245062000000,
'totalStockholderEquity': 72282000000,
'otherCurrentLiab': 39945000000
]
您必须创建一个新数组并遍历 AAPL
并将其值添加到您的列表中
listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
value = js.values()
listInput += value
df = pd.json_normalize(listInput)
如果您想将date
传递给数据框,则每次遍历js.items()
和js.keys()
时都必须创建一个新的json
listInput = []
for js in balance_sheet_data_qt["balanceSheetHistoryQuarterly"]["AAPL"]:
for key,value in zip(js.keys(), js.values()):
new_json = value.copy()
new_json["date"] = key
listInput.append(new_json)
df = pd.json_normalize(listInput)
【讨论】:
谢谢,你太棒了!我如何将日期传递到新的 col 中? @excelguy 我编辑了我的答案,你可以再检查一遍以上是关于Python,将 Json 标准化为数据框的主要内容,如果未能解决你的问题,请参考以下文章