将嵌套字典转换为表/父子结构,Python 3.6
Posted
技术标签:
【中文标题】将嵌套字典转换为表/父子结构,Python 3.6【英文标题】:Convert Nested Dictionary into Table/Parent Child Structure, Python 3.6 【发布时间】:2020-04-03 19:13:24 【问题描述】:想从下面的代码转换嵌套字典。
import requests
from bs4 import BeautifulSoup
url = 'https://www.bundesbank.de/en/statistics/time-series-databases/time-series-databases/743796/openAll?treeAnchor=BANKEN&statisticType=BBK_ITS'
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
def get_child_nodes(parent_node):
node_name = parent_node.a.get_text(strip=True)
result = "name": node_name, "children": []
children_list = parent_node.find('ul', recursive=False)
if not children_list:
return result
for child_node in children_list('li', recursive=False):
result["children"].append(get_child_nodes(child_node))
return result
Data_Dict = get_child_nodes(soup.find("div", class_="statisticTree"))
是否可以如图所示导出父-子?
以上代码来自@alecxe 的回答:Fetch complete List of Items using BeautifulSoup, Python 3.6
我试过了,但它太复杂了,无法理解,请帮忙。
字典:http://s000.tinyupload.com/index.php?file_id=97731876598977568058
样本字典数据:
"name": "Banks", "children": ["name": "Banks", "children": ["name": "Balance sheet items", "children":
["name": "Minimum reserves", "children": ["name": "Reserve maintenance in the euro area", "children": [], "name": "Reserve maintenance in Germany", "children": []],
"name": "Bank Lending Survey (BLS) - Results for Germany", "children": ["name": "Lending", "children": ["name": "Enterprises", "children": ["name": "Changes over the past three months", "children": ["name": "Credit standards and explanatory factors", "children": ["name": "Overall", "children": [], "name": "Loans to small and medium-sized enterprises", "children": [], "name": "Loans to large enterprises", "children": [], "name": "Short-term loans", "children": [], "name": "Long-term loans", "children": []], "name": "Terms and conditions and explanatory factors", "children": ["name": "Overall", "children": ["name": "Overall terms and conditions and explanatory factors", "children": [], "name": "Margins on average loans and explanatory factors", "children": [], "name": "Margins on riskier loans and explanatory factors", "children": [], "name": "Non-interest rate charges", "children": [], "name": "Size of the loan or credit line", "children": [], "name": "Collateral requirements", "children": [], "name": "Loan covenants", "children": [], "name": "Maturity", "children": []], "name": "Loans to small and medium-sized enterprises", "children": [], "name": "Loans to large enterprises", "children": []], "name": "Share of enterprise rejected loan applications", "children": []], "name": "Expected changes over the next three months", "children": ["name": "Credit standards", "children": []]], "name": "Households", "children": ["name": "Changes over the past three months", "children": ["name": "Credit standards and explanatory factors", "children": ["name": "Loans for house purchase", "children": [], "name": "Consumer credit and other lending", "children": []],
【问题讨论】:
由于转换只是字典,与BeautifulSoup无关,请提供一些可以复制/粘贴到Python中的示例数据,去掉不需要的部分代码;将其设为minimal reproducible example。 @kaya3,更新样本数据,请检查 【参考方案1】:您可以使用递归函数来处理这个问题。
def get_pairs(data, parent=''):
rv = [(data['name'], parent)]
for d in data['children']:
rv.extend(get_pairs(d, parent=data['name']))
return rv
Data_Dict = get_child_nodes(soup.find("div", class_="statisticTree"))
pairs = get_pairs(Data_Dict)
然后,您可以选择创建 DataFrame,或立即导出到 csv,如您的示例输出所示。要创建 DataFrame,我们可以这样做:
df = pd.DataFrame(get_pairs(Data_Dict), columns=['Name', 'Parent'])
给予:
Name Parent
0 Banks
1 Banks Banks
2 Balance sheet items Banks
3 Minimum reserves Balance sheet items
4 Reserve maintenance in the euro area Minimum reserves
... ...
3890 Number of transactions per type of terminal Payments statistics
3891 Value of transactions per type of terminal Payments statistics
3892 Number of OTC transactions Payments statistics
3893 Value of OTC transactions Payments statistics
3894 Issuance of banknotes Payments statistics
[3895 rows x 2 columns]
或者要输出到 csv,我们可以使用 csv
内置库:
import csv
with open('out.csv', 'w', newline='') as f:
writer = csv.writer(f, delimiter=',')
writer.writerow(('Name', 'Parent'))
for pair in pairs:
writer.writerow(pair)
输出:
【讨论】:
以上是关于将嵌套字典转换为表/父子结构,Python 3.6的主要内容,如果未能解决你的问题,请参考以下文章
将 Pandas Dataframe 转换为表记录的嵌套 JSON