如何将多层嵌套的json转换为sql表

Posted

技术标签:

【中文标题】如何将多层嵌套的json转换为sql表【英文标题】:how to convert multiple layers of nested json to sql table 【发布时间】:2017-03-20 06:42:20 【问题描述】:

在 *** 的帮助下,我能够做到这一点。需要更多帮助将 JSON 转换为 SQL 表。非常感谢任何帮助。


    "Volumes": [
        "AvailabilityZone": "us-east-1a",
        "Attachments": [
            "AttachTime": "2013-12-18T22:35:00.000Z",
            "InstanceId": "i-1234567890abcdef0",
            "VolumeId": "vol-049df61146c4d7901",
            "State": "attached",
            "DeleteOnTermination": true,
            "Device": "/dev/sda1",

            "Tags": [
                "Value": "DBJanitor-Private",
                "Key": "Name"
            , 
                "Value": "DBJanitor",
                "Key": "Owner"
            , 
                "Value": "Database",
                "Key": "Product"
            , 
                "Value": "DB Janitor",
                "Key": "Portfolio"
            , 
                "Value": "DB Service",
                "Key": "Service"
            ]
        ],
            "Ebs": 
                                "Status": "attached",
                                "DeleteOnTermination": true,
                                "VolumeId": "vol-049df61146c4d7901",
                                "AttachTime": "2016-09-14T19:49:11.000Z"
                            ,
        "VolumeType": "standard",
        "VolumeId": "vol-049df61146c4d7901"
    ]

在 *** 的帮助下,我能够解决直到标签。不知道如何解决 Ebs 问题。我对编码很陌生,非常感谢任何帮助。

In [1]: fn = r'D:\temp\.data\40454898.json'

In [2]: with open(fn) as f:
   ...:     data = json.load(f)
   ...:

In [14]: t = pd.io.json.json_normalize(data['Volumes'],
    ...:                               ['Attachments','Tags'],
    ...:                               [['Attachments', 'VolumeId'],
    ...:                                ['Attachments', 'InstanceId']])
    ...:

In [15]: t
Out[15]:
         Key              Value Attachments.InstanceId   Attachments.VolumeId
0       Name  DBJanitor-Private    i-1234567890abcdef0  vol-049df61146c4d7901
1      Owner          DBJanitor    i-1234567890abcdef0  vol-049df61146c4d7901
2    Product           Database    i-1234567890abcdef0  vol-049df61146c4d7901
3  Portfolio         DB Janitor    i-1234567890abcdef0  vol-049df61146c4d7901
4    Service         DB Service    i-1234567890abcdef0  vol-049df61146c4d7901

谢谢

【问题讨论】:

【参考方案1】:

json_normalize 需要一个 list 字典,如果是 Ebs - 它只是一个字典,所以我们应该预处理 JSON 数据:

In [88]: with open(fn) as f:
    ...:     data = json.load(f)
    ...:

In [89]: for r in data['Volumes']:
    ...:     if 'Ebs' not in r: # add 'Ebs' dict if it's not in the record...
    ...:         r['Ebs'] = []
    ...:     if not isinstance(r['Ebs'], list): # wrap 'Ebs' in a list if it's not a list 
    ...:         r['Ebs'] = [r['Ebs']]
    ...:

In [90]: data
Out[90]:
'Volumes': ['Attachments': ['AttachTime': '2013-12-18T22:35:00.000Z',
     'DeleteOnTermination': True,
     'Device': '/dev/sda1',
     'InstanceId': 'i-1234567890abcdef0',
     'State': 'attached',
     'Tags': ['Key': 'Name', 'Value': 'DBJanitor-Private',
      'Key': 'Owner', 'Value': 'DBJanitor',
      'Key': 'Product', 'Value': 'Database',
      'Key': 'Portfolio', 'Value': 'DB Janitor',
      'Key': 'Service', 'Value': 'DB Service'],
     'VolumeId': 'vol-049df61146c4d7901'],
   'AvailabilityZone': 'us-east-1a',
   'Ebs': ['AttachTime': '2016-09-14T19:49:11.000Z',
     'DeleteOnTermination': True,
     'Status': 'attached',
     'VolumeId': 'vol-049df61146c4d7901'],
   'VolumeId': 'vol-049df61146c4d7901',
   'VolumeType': 'standard']

注意:'Ebs': .. 已替换为 'Ebs': [..]

In [91]: e = pd.io.json.json_normalize(data['Volumes'],
    ...:                               ['Ebs'],
    ...:                               ['VolumeId'],
    ...:                               meta_prefix='parent_')
    ...:


In [92]: e
Out[92]:
                 AttachTime DeleteOnTermination    Status               VolumeId        parent_VolumeId
0  2016-09-14T19:49:11.000Z                True  attached  vol-049df61146c4d7901  vol-049df61146c4d7901

【讨论】:

以上是关于如何将多层嵌套的json转换为sql表的主要内容,如果未能解决你的问题,请参考以下文章

如何将 NodeJs 响应转换为嵌套的 json 响应

如何在python中将展平表转换为嵌套(分层)json?

将 Pandas Dataframe 转换为表记录的嵌套 JSON

将 Json 转换为 SQL 表

使用 Python 将多个关系表转换为嵌套 JSON 格式

我如何将平面数据框转换为 spark(scala 或 java)中的嵌套 json