将 Json 转换为 SQL 表
Posted
技术标签:
【中文标题】将 Json 转换为 SQL 表【英文标题】:Converting Json to SQL table 【发布时间】:2017-03-19 22:27:02 【问题描述】:我正在尝试学习如何将以下格式的 json 转换为 sql 表。我使用了 python pandas,它正在将 json 节点转换为字典。
相同的json:
"Volumes": [
"AvailabilityZone": "us-east-1a",
"Attachments": [
"AttachTime": "2013-12-18T22:35:00.000Z",
"InstanceId": "i-1234567890abcdef0",
"VolumeId": "vol-049df61146c4d7901",
"State": "attached",
"DeleteOnTermination": true,
"Device": "/dev/sda1"
],
"Tags": [
"Value": "DBJanitor-Private",
"Key": "Name"
,
"Value": "DBJanitor",
"Key": "Owner"
,
"Value": "Database",
"Key": "Product"
,
"Value": "DB Janitor",
"Key": "Portfolio"
,
"Value": "DB Service",
"Key": "Service"
],
"VolumeType": "standard",
"VolumeId": "vol-049df61146c4d7901",
"State": "in-use",
"SnapshotId": "snap-1234567890abcdef0",
"CreateTime": "2013-12-18T22:35:00.084Z",
"Size": 8
,
"AvailabilityZone": "us-east-1a",
"Attachments": [],
"VolumeType": "io1",
"VolumeId": "vol-1234567890abcdef0",
"State": "available",
"Iops": 1000,
"SnapshotId": null,
"CreateTime": "2014-02-27T00:02:41.791Z",
"Size": 100
]
直到现在..这是我正在尝试的......在python中:
asg_list_json_Tags=asg_list_json["AutoScalingGroups"]
Tags=pandas.DataFrame(asg_list_json_Tags)
n = []
for i in Tags.columns:
n.append(i)
print n
engine = create_engine("mysql+mysqldb://user:"+'pwd'+"@mysqlserver/dbname")
Tags.to_sql(name='TableName', con=engine, if_exists='append', index=True)
【问题讨论】:
似乎是什么问题?为什么该代码不起作用? 所以我得到一个错误说 dict 不能插入到字符串中 @DataJanitor,你想存储 flatten 数据吗? @MaxU - 是的!这就是我想做的事 @DataJanitor,问题是你想用Attachments
做什么?有些记录是缺失的,所以这里不能使用json_normalize
,因为没有Attachments
的记录不会被解析...
【参考方案1】:
我会这样做:
fn = r'D:\temp\.data\40450591.json'
with open(fn) as f:
data = json.load(f)
# some of your records seem NOT to have `Tags` key, hence `KeyError: 'Tags'`
# let's fix it
for r in data['Volumes']:
if 'Tags' not in r:
r['Tags'] = []
v = pd.DataFrame(data['Volumes']).drop(['Attachments', 'Tags'],1)
a = pd.io.json.json_normalize(data['Volumes'], 'Attachments', ['VolumeId'], meta_prefix='parent_')
t = pd.io.json.json_normalize(data['Volumes'], 'Tags', ['VolumeId'], meta_prefix='parent_')
v.to_sql('volume', engine)
a.to_sql('attachment', engine)
t.to_sql('tag', engine)
输出:
In [179]: v
Out[179]:
AvailabilityZone CreateTime Iops Size SnapshotId State VolumeType
VolumeId
vol-049df61146c4d7901 us-east-1a 2013-12-18T22:35:00.084Z NaN 8 snap-1234567890abcdef0 in-use standard
vol-1234567890abcdef0 us-east-1a 2014-02-27T00:02:41.791Z 1000.0 100 None available io1
In [180]: a
Out[180]:
AttachTime DeleteOnTermination Device InstanceId State VolumeId parent_VolumeId
0 2013-12-18T22:35:00.000Z True /dev/sda1 i-1234567890abcdef0 attached vol-049df61146c4d7901 vol-049df61146c4d7901
1 2013-12-18T22:35:11.000Z True /dev/sda1 i-1234567890abcdef1 attached vol-049df61146c4d7111 vol-049df61146c4d7901
In [217]: t
Out[217]:
Key Value parent_VolumeId
0 Name DBJanitor-Private vol-049df61146c4d7901
1 Owner DBJanitor vol-049df61146c4d7901
2 Product Database vol-049df61146c4d7901
3 Portfolio DB Janitor vol-049df61146c4d7901
4 Service DB Service vol-049df61146c4d7901
测试 JSON 文件:
"Volumes": [
"AvailabilityZone": "us-east-1a",
"Attachments": [
"AttachTime": "2013-12-18T22:35:00.000Z",
"InstanceId": "i-1234567890abcdef0",
"VolumeId": "vol-049df61146c4d7901",
"State": "attached",
"DeleteOnTermination": true,
"Device": "/dev/sda1"
,
"AttachTime": "2013-12-18T22:35:11.000Z",
"InstanceId": "i-1234567890abcdef1",
"VolumeId": "vol-049df61146c4d7111",
"State": "attached",
"DeleteOnTermination": true,
"Device": "/dev/sda1"
],
"Tags": [
"Value": "DBJanitor-Private",
"Key": "Name"
,
"Value": "DBJanitor",
"Key": "Owner"
,
"Value": "Database",
"Key": "Product"
,
"Value": "DB Janitor",
"Key": "Portfolio"
,
"Value": "DB Service",
"Key": "Service"
],
"VolumeType": "standard",
"VolumeId": "vol-049df61146c4d7901",
"State": "in-use",
"SnapshotId": "snap-1234567890abcdef0",
"CreateTime": "2013-12-18T22:35:00.084Z",
"Size": 8
,
"AvailabilityZone": "us-east-1a",
"Attachments": [],
"VolumeType": "io1",
"VolumeId": "vol-1234567890abcdef0",
"State": "available",
"Iops": 1000,
"SnapshotId": null,
"CreateTime": "2014-02-27T00:02:41.791Z",
"Size": 100
]
【讨论】:
谢谢@MaxU 真的很感激!如果有任何问题,我会将其插入并给您发送错误 打败我!使用匹配的 ID 捕获一对多以进行数据库规范化的好方法。 @DataJanitor,当然,很高兴我能帮上忙 @MaxU - 我能够成功实现代码!如果我还有一层复杂性,我该如何解决?对问题中的 json 进行了更改。请看问题 @DataJanitor,你可以像Attachments
一样解析它。注意:key Tags
必须在每条记录中出现,否则将无法正常工作。每个卷记录中都有Tags
吗?【参考方案2】:
类似于此示例:https://github.com/zolekode/json-to-tables/blob/master/example.py
使用以下脚本:
以下脚本将数据导出为 html,但您也可以将其导出为 SQL。
table_maker.save_tables(YOUR_PATH, export_as="sql", sql_connection=YOUR_CONNECTION)
# See the code below
import json
from extent_table import ExtentTable
from table_maker import TableMaker
Volumes = [
"AvailabilityZone": "us-east-1a",
"Attachments": [
"AttachTime": "2013-12-18T22:35:00.000Z",
"InstanceId": "i-1234567890abcdef0",
"VolumeId": "vol-049df61146c4d7901",
"State": "attached",
"DeleteOnTermination": "true",
"Device": "/dev/sda1"
],
"Tags": [
"Value": "DBJanitor-Private",
"Key": "Name"
,
"Value": "DBJanitor",
"Key": "Owner"
,
"Value": "Database",
"Key": "Product"
,
"Value": "DB Janitor",
"Key": "Portfolio"
,
"Value": "DB Service",
"Key": "Service"
],
"VolumeType": "standard",
"VolumeId": "vol-049df61146c4d7901",
"State": "in-use",
"SnapshotId": "snap-1234567890abcdef0",
"CreateTime": "2013-12-18T22:35:00.084Z",
"Size": 8
,
"AvailabilityZone": "us-east-1a",
"Attachments": [],
"VolumeType": "io1",
"VolumeId": "vol-1234567890abcdef0",
"State": "available",
"Iops": 1000,
"SnapshotId": "null",
"CreateTime": "2014-02-27T00:02:41.791Z",
"Size": 100
]
volumes = json.dumps(Volumes)
volumes = json.loads(volumes)
extent_table = ExtentTable()
table_maker = TableMaker(extent_table)
table_maker.convert_json_objects_to_tables(volumes, "volumes")
table_maker.show_tables(8)
table_maker.save_tables("./", export_as="html") # you can also pass in export_as="sql" or "csv". In the case of sql, there is a parameter to pass the engine.
HTML 输出:
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>ID</th>
<th>AvailabilityZone</th>
<th>VolumeType</th>
<th>VolumeId</th>
<th>State</th>
<th>SnapshotId</th>
<th>CreateTime</th>
<th>Size</th>
<th>Iops</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>us-east-1a</td>
<td>standard</td>
<td>vol-049df61146c4d7901</td>
<td>in-use</td>
<td>snap-1234567890abcdef0</td>
<td>2013-12-18T22:35:00.084Z</td>
<td>8</td>
<td>None</td>
</tr>
<tr>
<td>1</td>
<td>us-east-1a</td>
<td>io1</td>
<td>vol-1234567890abcdef0</td>
<td>available</td>
<td>null</td>
<td>2014-02-27T00:02:41.791Z</td>
<td>100</td>
<td>1000</td>
</tr>
<tr>
<td>2</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
</tbody>
</table>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>ID</th>
<th>PARENT_ID</th>
<th>is_scalar</th>
<th>scalar</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
</tbody>
</table>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>ID</th>
<th>AttachTime</th>
<th>InstanceId</th>
<th>VolumeId</th>
<th>State</th>
<th>DeleteOnTermination</th>
<th>Device</th>
<th>PARENT_ID</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>2013-12-18T22:35:00.000Z</td>
<td>i-1234567890abcdef0</td>
<td>vol-049df61146c4d7901</td>
<td>attached</td>
<td>true</td>
<td>/dev/sda1</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
</tbody>
</table>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>ID</th>
<th>PARENT_ID</th>
<th>is_scalar</th>
<th>scalar</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
<tr>
<td>2</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
<tr>
<td>3</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>False</td>
<td>None</td>
</tr>
</tbody>
</table>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>ID</th>
<th>Value</th>
<th>Key</th>
<th>PARENT_ID</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>DBJanitor-Private</td>
<td>Name</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>DBJanitor</td>
<td>Owner</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>Database</td>
<td>Product</td>
<td>2</td>
</tr>
<tr>
<td>3</td>
<td>DB Janitor</td>
<td>Portfolio</td>
<td>3</td>
</tr>
<tr>
<td>4</td>
<td>DB Service</td>
<td>Service</td>
<td>4</td>
</tr>
<tr>
<td>5</td>
<td>None</td>
<td>None</td>
<td>None</td>
</tr>
</tbody>
</table>
【讨论】:
问题是如何转换为 SQL,而不是 HTML 表。 @szeta 这有点不公平,你不觉得吗?你看到上面这行代码了吗?table_maker.save_tables("./", export_as="html") # you can also pass in export_as="sql" or "csv". In the case of sql, there is a parameter to pass the engine.
我说导出可以是从html到sql的任何东西。 HTML 表格输出是为了提供视觉效果。我希望你重新考虑你的投票。
我没有看到这条评论。为避免将来发生这种情况,我建议更好地突出这些要点。回答这个问题最相关的一点是(是!)隐藏在评论深处,远非显而易见。如果您编辑答案,我将撤销反对票。 (现在在编辑之前被阻止)以上是关于将 Json 转换为 SQL 表的主要内容,如果未能解决你的问题,请参考以下文章