循环遍历动态表 - python

Posted

技术标签:

【中文标题】循环遍历动态表 - python【英文标题】:Looping through a dynamic table - python 【发布时间】:2021-12-24 15:37:44 【问题描述】:

请帮忙。我已经为此工作了好几天,但我无法弄清楚我在哪里弄错了。我试图遍历一个表,但我只得到第一行,没有别的。我究竟做错了什么?我猜我的循环可能是罪魁祸首,但我还是 python 新手,无法弄清楚。我想在 excel 文档中完成所有内容

from numpy import fabs
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
driver = webdriver.Chrome(r"C:\Users\noree\OneDrive\Documents\chromedriver.exe")

driver.get('https://www.depositaccounts.com/banks/assets.aspx?instType=&stateType=hq&state=')
driver.maximize_window()

#get url largest banks and credit unions by assets

#Show all entries - xpath for show all button
show_all_button = driver.find_element(By.XPATH,'//*[@id="results"]/div/a')

# Click 'Show all' Button
show_all_button.click()   

#scrape the tables
rank = driver.find_elements(By.XPATH, '//*[@id="assetsTable"]/tbody/tr[2]/td[1]')
financial_institution = driver.find_elements(By.XPATH,'//table[@id="assetsTable"]/tbody/tr[2]/td[2]/a')
headquarters = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[3]')
assets = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[4]')
asset_growth = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[5]')
branches = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[6]')
states_with_branches = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[7]')
employees = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[7]')
customer_accounts = driver.find_elements(By.XPATH, '//tbody/tr[2]/td[8]')

#create empty list
bank_results = []
for i in range(len(rank)):
    temporary_data=
        'Rank': rank[i].text,
        'Financial Institution': financial_institution[i].text,
        'Headquarters': headquarters[i].text,
        'Assets': assets[i].text,
        'Asset Growth': asset_growth[i].text,
        'Branches': branches[i].text,
        'States with Branches': states_with_branches[i].text,
        'Employees': employees[i].text,
        'Customer Accounts': customer_accounts[i].text
    
    bank_results.append(temporary_data)

df_data = pd.DataFrame(bank_results)
df_data

【问题讨论】:

【参考方案1】:

你的错误来自于你只选择了一条记录

如果您想要最接近您的解决方案:

from selenium.webdriver.support import expected_conditions as EC
 :
 :
url='https://www.depositaccounts.com/banks/assets.aspx?instType=&stateType=hq&state='
driver.get(url)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="results"]/div/a'))).click()
time.sleep(3)

records = driver.find_elements(By.XPATH, "//table[@id='assetsTable']//tr[not(./th)]")
nbr_records = len(records)

bank_results = []
for i in range(nbr_records):
    temporary_data=
        'Rank': records[i].find_element(By.XPATH, "./td[1]").text,
        'Financial Institution': records[i].find_element(By.XPATH, "./td[2]").text,
        'Headquarters': records[i].find_element(By.XPATH, "./td[3]").text,
        'Assets': records[i].find_element(By.XPATH, "./td[4]").text,
        'Asset Growth': records[i].find_element(By.XPATH, "./td[5]").text,
        'Branches': records[i].find_element(By.XPATH, "./td[6]").text,
        'States with Branches': records[i].find_element(By.XPATH, "./td[7]").text,
        'Employees': records[i].find_element(By.XPATH, "./td[8]").text,
        'Customer Accounts': records[i].find_element(By.XPATH, "./td[9]").text
    
    bank_results.append(temporary_data)

xpath = "//table[@id='assetsTable']//tr[not(./th)]"

意思

选择所有 trs 不包含标签 th 并且有一个父 table id = assetsTable

【讨论】:

非常感谢。我不知道该怎么感谢你才足够。现在我明白我做错了什么

以上是关于循环遍历动态表 - python的主要内容,如果未能解决你的问题,请参考以下文章

Oracle - 匿名过程循环遍历多个表(动态) - 查询返回多行

动态 SQL Server 查询循环遍历架构查找主键重复

如何遍历一组动态表单输入并将它们插入到多个表中?

sahi - 动态表

如何在循环的帮助下动态地从 xml 文件中获取表到 ajax?

dp表模型-如何写出for循环动态规划