从 json 中提取键和值到新的数据帧
Posted
技术标签:
【中文标题】从 json 中提取键和值到新的数据帧【英文标题】:Extract key and value from json to new dataframe 【发布时间】:2019-10-16 12:05:39 【问题描述】:我有一个列中有 JSON 值的数据框。这些被缩进到多个级别。我想将结束键和值提取到一个新的数据框中。我将在下面为您提供示例列值
'shipping_assignments': ['shipping': 'address': 'address_type': 'shipping', 'city': 'Calder', 'country_id': 'US', 'customer_address_id': 1, 'email': 'roni_cost@example.com', 'entity_id': 1, 'firstname': 'Veronica', 'lastname': 'Costello', 'parent_id':1,'邮政编码':'49628-7978','地区':'密歇根', 'region_code': 'MI', 'region_id': 33, 'street': ['6146 Honey Bluff 百汇'], '电话': '(555) 229-3326', '方法': 'flatrate_flatrate', 'total': 'base_shipping_amount': 5, 'base_shipping_discount_amount':0, 'base_shipping_discount_tax_compensation_amnt':0, 'base_shipping_incl_tax':5,'base_shipping_invoiced':5, 'base_shipping_tax_amount':0,'shipping_amount':5, 'shipping_discount_amount': 0, “shipping_discount_tax_compensation_amount”:0,“shipping_incl_tax”: 5,'shipping_invoiced':5,'shipping_tax_amount':0,'items': ['amount_refunded':0,'applied_rule_ids':'1', “base_amount_refunded”:0,“base_discount_amount”:0, 'base_discount_invoiced':0,'base_discount_tax_compensation_amount': 0, 'base_discount_tax_compensation_invoiced': 0, “base_original_price”:29,“base_price”:29,“base_price_incl_tax”: 31.39,“base_row_invoiced”:29,“base_row_total”:29,“base_row_total_incl_tax”:31.39,“base_tax_amount”:2.39, 'base_tax_invoiced':2.39,'created_at':'2019-09-27 10:03:45', “折扣金额”:0,“折扣发票”:0,“折扣百分比”:0, 'free_shipping':0,'discount_tax_compensation_amount':0, 'discount_tax_compensation_invoiced':0,'is_qty_decimal':0, 'item_id': 1, 'name': 'Iris Workout Top', 'no_discount': 0, 'order_id':1,'original_price':29,'price':29,'price_incl_tax': 31.39,“product_id”:1434,“product_type”:“可配置”,“qty_canceled”:0,“qty_invoiced”:1,“qty_ordered”:1, 'qty_refunded':0,'qty_shipped':1,'row_invoiced':29,'row_total': 29,'row_total_incl_tax':31.39,'row_weight':1,'sku': 'WS03-XS-Red','store_id':1,'tax_amount':2.39,'tax_invoiced': 2.39,'tax_percent':8.25,'updated_at':'2019-09-27 10:03:46','weight':1,'product_option':'extension_attributes': 'configurable_item_options':['option_id':'141','option_value': 167,'option_id':'93','option_value':58]]], 'payment_additional_info': ['key': 'method_title', 'value': '检查/ 汇票'], 'applied_taxes': ['code': 'US-MI--Rate 1', 'title': 'US-MI--Rate 1', 'percent': 8.25, 'amount': 2.39, 'base_amount': 2.39], 'item_applied_taxes': ['type': 'product', 'applied_taxes': ['code': 'US-MI--Rate 1', 'title': 'US-MI- -比率 1','百分比': 8.25, 'amount': 2.39, 'base_amount': 2.39]], 'converting_from_quote': True
上面是数据框列df['x']的单行值
我的代码如下转换
sample = data['x'].tolist()
data = json.dumps(sample)
df = pd.read_json(data)
它提供了带有列的新数据框
索引(['applied_taxes', 'converting_from_quote', 'item_applied_taxes', 'payment_additional_info', 'shipping_assignments'], dtype='object')
当我尝试执行上述相同操作来转换具有行值的列时
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
m_sample = m_df.tolist()
m_data = json.dumps(m_sample)
c_df = pd.read_json(m_data)
没用
点击此链接获取beautified_json
【问题讨论】:
【参考方案1】:我在 python 中发现了一个漂亮的 ETL 包,名为 petl。借助名为 fromdicts(json_string) 的函数将 json 列表转换为 dict 形式
order_table = fromdicts(data_list)
如果您在任何列中发现任何嵌套字典,请使用 unpackdict(order_table,'nested_col') 它将解压嵌套的字典。 在我的例子中,我需要解压apply_tax 列。下面的代码将解包并附加键和值作为同一表中的列和行。
order_table = unpackdict(order_table, 'applied_taxes')
如果你们想了解更多关于 -petl
【讨论】:
【参考方案2】:看来你的错误是在tolist()
。请尝试以下操作:
import pandas as pd
import json
import re
data = "shipping_assignments":["shipping":"address":"address_type":"shipping","city":"Calder","country_id":"US","customer_address_id":1,"email":"roni_cost@example.com","entity_id":1,"firstname":"Veronica","lastname":"Costello","parent_id":1,"postcode":"49628-7978","region":"Michigan","region_code":"MI","region_id":33,"street":["6146 Honey Bluff Parkway"],"telephone":"(555) 229-3326","method":"flatrate_flatrate","total":"base_shipping_amount":5,"base_shipping_discount_amount":0,"base_shipping_discount_tax_compensation_amnt":0,"base_shipping_incl_tax":5,"base_shipping_invoiced":5,"base_shipping_tax_amount":0,"shipping_amount":5,"shipping_discount_amount":0,"shipping_discount_tax_compensation_amount":0,"shipping_incl_tax":5,"shipping_invoiced":5,"shipping_tax_amount":0,"items":["amount_refunded":0,"applied_rule_ids":"1","base_amount_refunded":0,"base_discount_amount":0,"base_discount_invoiced":0,"base_discount_tax_compensation_amount":0,"base_discount_tax_compensation_invoiced":0,"base_original_price":29,"base_price":29,"base_price_incl_tax":31.39,"base_row_invoiced":29,"base_row_total":29,"base_row_total_incl_tax":31.39,"base_tax_amount":2.39,"base_tax_invoiced":2.39,"created_at":"2019-09-27 10:03:45","discount_amount":0,"discount_invoiced":0,"discount_percent":0,"free_shipping":0,"discount_tax_compensation_amount":0,"discount_tax_compensation_invoiced":0,"is_qty_decimal":0,"item_id":1,"name":"Iris Workout Top","no_discount":0,"order_id":1,"original_price":29,"price":29,"price_incl_tax":31.39,"product_id":1434,"product_type":"configurable","qty_canceled":0,"qty_invoiced":1,"qty_ordered":1,"qty_refunded":0,"qty_shipped":1,"row_invoiced":29,"row_total":29,"row_total_incl_tax":31.39,"row_weight":1,"sku":"WS03-XS-Red","store_id":1,"tax_amount":2.39,"tax_invoiced":2.39,"tax_percent":8.25,"updated_at":"2019-09-27 10:03:46","weight":1,"product_option":"extension_attributes":"configurable_item_options":["option_id":"141","option_value":167,"option_id":"93","option_value":58]]],"payment_additional_info":["key":"method_title","value":"Check / Money order"],"applied_taxes":["code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39],"item_applied_taxes":["type":"product","applied_taxes":["code":"US-MI-*-Rate 1","title":"US-MI-*-Rate 1","percent":8.25,"amount":2.39,"base_amount":2.39]],"converting_from_quote":"True"
df = pd.read_json(json.dumps(data))
m_df = df['applied_taxes'].apply(lambda x : re.sub('.?\[|$.|]',"", str(x)))
c_df = pd.read_json(json.dumps(list(m_df)))
print(c_df)
打印以下内容:
0
0 'code': 'US-MI-*-Rate 1', 'title': 'US-MI-*-R...
【讨论】:
我们需要行的结束键和值作为新的数据框。例如,代码为列,其值为行等以上是关于从 json 中提取键和值到新的数据帧的主要内容,如果未能解决你的问题,请参考以下文章
如何从嵌套 Json 数组角度 2 打印和分离键和值并将其添加到选择框中
在 json 文档中没有指定键的 bigquery 中从 json 字符串中提取键和值