指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代

Posted

技术标签:

【中文标题】指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代【英文标题】:python pandas json_normalize in 1.0.0 with meta path specified - expects iterable 【发布时间】:2020-05-23 09:53:00 【问题描述】:

我有数据

["state": "Florida",
         "shortname": "FL",
         "info": "governor": "Rick Scott",
         "counties": ["name": "Dade",
                       "population": 12345,
                       "Attributes": [
                         
                          "capture_date": "2020-01-29",
                          "Spirit_code": "TRLQR",
                          "value": 1
                         ,
                         
                            "capture_date": "2020-01-29",
                            "Spirit_code": "HA***",
                            "value": 57000
                        

                       ],
                       "name": "Broward",
                        "population": 40000,
                         "Attributes": [
                         
                            "capture_date": "2020-01-29",
                            "Spirit_code": "GMSTP",
                            "value": 14
                        ,
                        
                            "capture_date": "2020-01-29",
                            "Spirit_code": "GWTPN",
                            "value": 11212
                        
                       ]
                       ,
                       "name": "Palm Beach",
                        "population": 60000,
                        "Attributes": [
                            "capture_date": "2020-01-29",
                            "Spirit_code": "YGHMN",
                            "value": 154.01
                        ,
                        
                            "capture_date": "2020-01-29",
                            "Spirit_code": "CXZASD",
                            "value": 154.01
                        ]
                       
         ],
        "state": "Ohio",
         "shortname": "OH",
         "info": "governor": "John Kasich",
         "counties": ["name": "Summit", "population": 1234,
                      "Attributes": [
                            "capture_date": "2020-01-29",
                            "Spirit_code": "QWERTY",
                            "value": 154.01
                        ,
                        
                            "capture_date": "2020-01-29",
                            "Spirit_code": "JKLGH",
                            "value": 154.01
                        ]
         ,
                      "name": "Cuyahoga", "population": 1337,
                      "Attributes": [
                            "capture_date": "2020-01-29",
                            "Spirit_code": "ASDF",
                            "value": 154.01
                        ,
                        
                            "capture_date": "2020-01-29",
                            "Spirit_code": "POIUY",
                            "value": 154.01
                        ]

                      ],
        
]

我正在寻找结果:

state,   shortname, name,       population, attirbute.capture_date, attirbute.spirit_code, attirbute.value
florida, FL        ,Dade,       12345     , 2020-0-29             , TRLQR                , 1
florida, FL        ,Dade,       12345     , 2020-0-29             , HA***                , 57000
florida, FL        ,Broward,    40000     , 2020-0-29             , GMSTP                , 14
florida, FL        ,Broward,    40000     , 2020-0-29             , GWTPN                , 11212
florida, FL        ,Palm Beach, 60000     , 2020-0-29             , YGHMN                , 154.01
florida, FL        ,Palm Beach, 60000     , 2020-0-29             , YGHMN                , 154.01
florida, FL        ,Palm Beach, 60000     , 2020-0-29             , CXZASD                , 154.01

基本上对嵌套 json 中的关键属性进行规范化。关键:“属性”。

json_normalize(data["data"], ["counties", "Attributes"], ["state", "shortname", ["counties", "name"], ["counties", "population"]])

我收到错误:

TypeError: 'name': 'Dade', 'population': 12345, 'Attributes': ['capture_date': '2020-01-29', 'Spirit_code': 'TRLQR', 'value': 1, 'capture_date': '2020-01-29', 'Spirit_code': 'HA***', 'value': 57000] has non iterable value 12345 for path ['population']. Must be iterable or null.

但如果我运行:

plots_in = json_normalize(data["data"], ["counties", "Attributes"],
                   ["state", "shortname", ["counties", "name"]])

我得到了结果:

  capture_date Spirit_code     value    state shortname counties.name
0   2020-01-29       TRLQR      1.00  Florida        FL          Dade
1   2020-01-29       HA***  57000.00  Florida        FL          Dade
2   2020-01-29       GMSTP     14.00  Florida        FL       Broward
3   2020-01-29       GWTPN  11212.00  Florida        FL       Broward
4   2020-01-29       YGHMN    154.01  Florida        FL    Palm Beach
5   2020-01-29      CXZASD    154.01  Florida        FL    Palm Beach
6   2020-01-29      QWERTY    154.01     Ohio        OH        Summit
7   2020-01-29       JKLGH    154.01     Ohio        OH        Summit
8   2020-01-29        ASDF    154.01     Ohio        OH      Cuyahoga
9   2020-01-29       POIUY    154.01     Ohio        OH      Cuyahoga

与填充键中的整数值有关吗? 因为如果我运行以下命令,我仍然会收到相同的错误:

plots_in = json_normalize(data["data"], ["counties", "population"])

请解释一下,如果有人知道下面发生了什么?

【问题讨论】:

有谁知道,下面到底发生了什么? 【参考方案1】:

检查您的 pandas 版本。如果它是 pandas 1.0.0 那么它很可能与: json_normalize in 1.0.0 with meta path specified - expects iterable #31507

我在 linux 中重新安装开发环境时遇到了完全相同的问题,包括使用 pandas 1.0.0 安装所有最新软件包。经过一番搜索,我找到了上面的链接,然后通过首先卸载删除了 pandas 1.0.0 并安装了 pandas 0.25.3:

pip3 uninstall pandas # or pip uninstall pandas

然后:

pip3 install pandas==0.25.3 # or pip install pandas==0.25.3

之后一切正常,就像安装最新的 pandas 之前一样。

【讨论】:

很高兴它有帮助。 有什么办法,我们可以使用 pyspark 达到同样的效果吗? 您可能想将此作为另一个问题发布,我是 spark 的新手。 问题已经解决了。您可以使用 Pandas 1.0.5【参考方案2】:

这是一种方法:

# s is the given json sample
df = pd.io.json.json_normalize(s)

# unnest the list
df['counties'] = df['counties'].str[0]

# convert counties dict into cols
df = pd.concat([df, df.pop('counties').apply(pd.Series)],  axis=1)

# unnest the list
df['Attributes'] = df['Attributes'].str[0]

# convert Attributes dict into cols
df = pd.concat([df, df.pop('Attributes').apply(pd.Series)],  axis=1)

print(df)

     state shortname info.governor    name  population capture_date  \
0  Florida        FL    Rick Scott    Dade       12345   2020-01-29   
1     Ohio        OH   John Kasich  Summit        1234   2020-01-29   

  Spirit_code   value  
0       TRLQR    1.00  
1      QWERTY  154.01  

【讨论】:

以上是关于指定元路径的 1.0.0 中的 python pandas json_normalize - 期望可迭代的主要内容,如果未能解决你的问题,请参考以下文章

具有自定义元类行为的 Python 元类

python 时间段的随机日期输出

os. 模块

python生成随机日期字符串

python常用模块-os

BUG记录Python中的相对文件路径