指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代
Posted
技术标签:
【中文标题】指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代【英文标题】:python pandas json_normalize in 1.0.0 with meta path specified - expects iterable 【发布时间】:2020-05-23 09:53:00 【问题描述】:我有数据
["state": "Florida",
"shortname": "FL",
"info": "governor": "Rick Scott",
"counties": ["name": "Dade",
"population": 12345,
"Attributes": [
"capture_date": "2020-01-29",
"Spirit_code": "TRLQR",
"value": 1
,
"capture_date": "2020-01-29",
"Spirit_code": "HA***",
"value": 57000
],
"name": "Broward",
"population": 40000,
"Attributes": [
"capture_date": "2020-01-29",
"Spirit_code": "GMSTP",
"value": 14
,
"capture_date": "2020-01-29",
"Spirit_code": "GWTPN",
"value": 11212
]
,
"name": "Palm Beach",
"population": 60000,
"Attributes": [
"capture_date": "2020-01-29",
"Spirit_code": "YGHMN",
"value": 154.01
,
"capture_date": "2020-01-29",
"Spirit_code": "CXZASD",
"value": 154.01
]
],
"state": "Ohio",
"shortname": "OH",
"info": "governor": "John Kasich",
"counties": ["name": "Summit", "population": 1234,
"Attributes": [
"capture_date": "2020-01-29",
"Spirit_code": "QWERTY",
"value": 154.01
,
"capture_date": "2020-01-29",
"Spirit_code": "JKLGH",
"value": 154.01
]
,
"name": "Cuyahoga", "population": 1337,
"Attributes": [
"capture_date": "2020-01-29",
"Spirit_code": "ASDF",
"value": 154.01
,
"capture_date": "2020-01-29",
"Spirit_code": "POIUY",
"value": 154.01
]
],
]
我正在寻找结果:
state, shortname, name, population, attirbute.capture_date, attirbute.spirit_code, attirbute.value
florida, FL ,Dade, 12345 , 2020-0-29 , TRLQR , 1
florida, FL ,Dade, 12345 , 2020-0-29 , HA*** , 57000
florida, FL ,Broward, 40000 , 2020-0-29 , GMSTP , 14
florida, FL ,Broward, 40000 , 2020-0-29 , GWTPN , 11212
florida, FL ,Palm Beach, 60000 , 2020-0-29 , YGHMN , 154.01
florida, FL ,Palm Beach, 60000 , 2020-0-29 , YGHMN , 154.01
florida, FL ,Palm Beach, 60000 , 2020-0-29 , CXZASD , 154.01
基本上对嵌套 json 中的关键属性进行规范化。关键:“属性”。
json_normalize(data["data"], ["counties", "Attributes"], ["state", "shortname", ["counties", "name"], ["counties", "population"]])
我收到错误:
TypeError: 'name': 'Dade', 'population': 12345, 'Attributes': ['capture_date': '2020-01-29', 'Spirit_code': 'TRLQR', 'value': 1, 'capture_date': '2020-01-29', 'Spirit_code': 'HA***', 'value': 57000] has non iterable value 12345 for path ['population']. Must be iterable or null.
但如果我运行:
plots_in = json_normalize(data["data"], ["counties", "Attributes"],
["state", "shortname", ["counties", "name"]])
我得到了结果:
capture_date Spirit_code value state shortname counties.name
0 2020-01-29 TRLQR 1.00 Florida FL Dade
1 2020-01-29 HA*** 57000.00 Florida FL Dade
2 2020-01-29 GMSTP 14.00 Florida FL Broward
3 2020-01-29 GWTPN 11212.00 Florida FL Broward
4 2020-01-29 YGHMN 154.01 Florida FL Palm Beach
5 2020-01-29 CXZASD 154.01 Florida FL Palm Beach
6 2020-01-29 QWERTY 154.01 Ohio OH Summit
7 2020-01-29 JKLGH 154.01 Ohio OH Summit
8 2020-01-29 ASDF 154.01 Ohio OH Cuyahoga
9 2020-01-29 POIUY 154.01 Ohio OH Cuyahoga
与填充键中的整数值有关吗? 因为如果我运行以下命令,我仍然会收到相同的错误:
plots_in = json_normalize(data["data"], ["counties", "population"])
请解释一下,如果有人知道下面发生了什么?
【问题讨论】:
有谁知道,下面到底发生了什么? 【参考方案1】:检查您的 pandas 版本。如果它是 pandas 1.0.0 那么它很可能与: json_normalize in 1.0.0 with meta path specified - expects iterable #31507
我在 linux 中重新安装开发环境时遇到了完全相同的问题,包括使用 pandas 1.0.0 安装所有最新软件包。经过一番搜索,我找到了上面的链接,然后通过首先卸载删除了 pandas 1.0.0 并安装了 pandas 0.25.3:
pip3 uninstall pandas # or pip uninstall pandas
然后:
pip3 install pandas==0.25.3 # or pip install pandas==0.25.3
之后一切正常,就像安装最新的 pandas 之前一样。
【讨论】:
很高兴它有帮助。 有什么办法,我们可以使用 pyspark 达到同样的效果吗? 您可能想将此作为另一个问题发布,我是 spark 的新手。 问题已经解决了。您可以使用 Pandas 1.0.5【参考方案2】:这是一种方法:
# s is the given json sample
df = pd.io.json.json_normalize(s)
# unnest the list
df['counties'] = df['counties'].str[0]
# convert counties dict into cols
df = pd.concat([df, df.pop('counties').apply(pd.Series)], axis=1)
# unnest the list
df['Attributes'] = df['Attributes'].str[0]
# convert Attributes dict into cols
df = pd.concat([df, df.pop('Attributes').apply(pd.Series)], axis=1)
print(df)
state shortname info.governor name population capture_date \
0 Florida FL Rick Scott Dade 12345 2020-01-29
1 Ohio OH John Kasich Summit 1234 2020-01-29
Spirit_code value
0 TRLQR 1.00
1 QWERTY 154.01
【讨论】:
以上是关于指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代的主要内容,如果未能解决你的问题,请参考以下文章