将 CSV 数据转换为嵌套 JSON
Posted
技术标签:
【中文标题】将 CSV 数据转换为嵌套 JSON【英文标题】:Convert CSV Data to Nested JSON 【发布时间】:2022-01-07 00:44:00 【问题描述】:我的任务是使用 python 将数据从 csv 文件转换为嵌套的 JSON 文件以供 Web 使用。我在this article 中尝试了 Python 代码。所需的输出将是一个 member_id
在 JSON 文件中显示一次,对于一个 member_id
下的 tag_name
相同。问题是,当我尝试仅使用member_id
的groupby
时,tag_name
'm1' 会在'abc123' 下多次显示。如果我尝试使用groupby
和member_id
和tag_name
,'abc123' 将针对标签'm1' 和'm2' 出现两次。我已经用谷歌搜索了一段时间,但大多数分辨率只针对一个嵌套(不确定我是否使用了正确的术语)。如果有任何可能的方法,请告诉我。
示例代码:
import json
import pandas as pd
df = pd.read_csv('../detail.csv', sep=',', header=0
, index_col=False
, dtype = 'member_id':str,'tag_name':str,'detail_name':str,'detail_value':str )
group = df.groupby(['member_id','tag_name'])
finalList, finalDict = [],
for key, value in group:
dictionary, dictionary1, dictList, dictList1 = , , [], []
j = group.get_group(key).reset_index(drop=True)
dictionary['member_id'] = j.at[0,'member_id']
dictionary1['tag_name'] = j.at[0,'tag_name']
for i in j.index:
anotherDict =
anotherDict['detail_name'] = j.at[i,'detail_name']
anotherDict['detail_value'] = j.at[i,'detail_value']
dictList1.append(anotherDict.copy())
dictionary1['detail'] = dictList1
dictList.append(dictionary1)
dictionary['tag'] = dictList
finalList.append(dictionary)
json.dumps(finalList,ensure_ascii = False)
detail.csv:
member_id, tag_name, detail_name, detail_value
-------------------------------------------------------
abc123, m1, Service_A, 20
abc123, m1, Service_B, 20
abc123, m2, Service_C, 10
xyz456, m3, Service A, 5
xyz456, m3, Service A, 10
所需的输出 JSON:
"member_id": "abc123",
"tag":[ "tag_name": "m1",
"detail":[ "detail_name": "Service_A",
"detail_value": "20",
"detail_name": "Service_B",
"detail_value": "20"],
"tag_name": "m2",
"detail":[ "detail_name": "Service_C",
"detail_value": "10"]],
"member_id": "xyz456",
"tag":["tag_name": "m3",
"detail":[ "detail_name": "Service_A",
"detail_value": "5",
"detail_name": "Service_A",
"detail_value": "10"]]
【问题讨论】:
请分享您当前的代码。 @balderman 添加。 【参考方案1】:我不知道允许直接实现这一点的 pandas 函数。此外,您引入了不属于初始数据帧的键(tag
、detail
)。所以实现一个通用的解决方案似乎很困难。
但是,如果您的列数不超过问题中所述的列数,则可以遍历数据框,逐列分组:
result = []
for member_id, member_df in df.groupby('member_id'):
member_dict = 'member_id': member_id
member_dict['tag'] = []
for tag_name, tag_df in member_df.groupby('tag_name'):
tag_dict = 'tag_name': tag_name
tag_dict['detail'] = []
for detail_name, detail_df in tag_df.groupby('detail_name'):
detail_dict = 'detail_name': detail_name
detail_dict['detail_value'] = detail_df.detail_value.mean() # should be only one value, taking 'mean' just in case
tag_dict['detail'].append(detail_dict)
member_dict['tag'].append(tag_dict)
result.append(member_dict)
print(json.dumps(result, indent=4))
输出:
[
"member_id": "abc123",
"tag": [
"tag_name": "m1",
"detail": [
"detail_name": "Service_A",
"detail_value": 20.0
,
"detail_name": "Service_B",
"detail_value": 20.0
]
,
"tag_name": "m2",
"detail": [
"detail_name": "Service_C",
"detail_value": 10.0
]
]
,
"member_id": "xyz456",
"tag": [
"tag_name": "m3",
"detail": [
"detail_name": "Service A",
"detail_value": 5.0
]
]
]
编辑:如果您不希望列表中出现唯一的详细信息名称,请使用更短的名称:
result = []
for member_id, member_df in df.groupby('member_id'):
member_dict = 'member_id': member_id
member_dict['tag'] = []
for tag_name, tag_df in member_df.groupby('tag_name'):
tag_dict = 'tag_name': tag_name
tag_dict['detail'] = tag_df[['detail_name', 'detail_value']].to_dict(orient='records')
member_dict['tag'].append(tag_dict)
result.append(member_dict)
print(json.dumps(result, indent=4))
【讨论】:
Tranbi,有什么方法可以提高性能吗? csv 可能有一千万行。 我想不出一种可以大大提高性能的方法。但是你真的需要这么大的 json 吗?您可以将每个member_dict
保存到其自己的 json 文件中。这样会更容易记忆。
明白。还有一件事,如果我有一个像'xyz456,m3,服务A,10'这样的新行,并且我希望它们显示在不同的字典中,比如(abc123,m1)下的服务A,服务B,我应该如何修改代码?
什么意思? xyz456 已经在不同的字典中
知道了。检查我的更新答案!以上是关于将 CSV 数据转换为嵌套 JSON的主要内容,如果未能解决你的问题,请参考以下文章
将 Pandas Dataframe 转换为表记录的嵌套 JSON