为啥我的 Python 代码为列表中的所有元素提取相同的数据?

Posted

技术标签:

【中文标题】为啥我的 Python 代码为列表中的所有元素提取相同的数据?【英文标题】:Why my Python code is extracting the same data for all the elements in my list?为什么我的 Python 代码为列表中的所有元素提取相同的数据? 【发布时间】:2021-10-19 09:07:13 【问题描述】:

我的项目包括为代理机构制作具有竞争力的酒店价格表。我想自动化这是一个痛苦的动作,代码正确地提取了酒店的名称和我想要提取的价格,但它只对第一家酒店正常工作,我不知道问题出在哪里。我为您提供代码和输出,如果你们中的任何人可以帮助我并提前感谢您。

注意:代码 2 工作正常,但是当我添加更多操作时,问题出现了

代码 1

#!/usr/bin/env python
# coding: utf-8
import time
from time import sleep
import ast
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome("C:\\Users\\marketing2\\Documents\\chromedriver.exe")
driver.get('https://tn.tunisiebooking.com/')

# params to select
params = 
    'destination': 'Tozeur',
    'date_from': '11/09/2021',
    'date_to': '12/09/2021',
    'bedroom': '1'


# select destination
destination_select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, 'ville_des'))))
destination_select.select_by_value(params['destination'])

# select bedroom
bedroom_select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'select_ch'))))
bedroom_select.select_by_value(params['bedroom'])

# select dates
script = f"document.getElementById('checkin').value ='params['date_from']';"
script += f"document.getElementById('checkout').value ='params['date_to']';"
script +=  f"document.getElementById('depart').value ='params['date_from']';"
script += f"document.getElementById('arrivee').value ='params['date_to']';"
driver.execute_script(script)

# submit form
btn_rechercher = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="boutonr"]')))
btn_rechercher.click()

urls = []
hotels = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'produit_affair')]")))

for hotel in hotels:
    link = hotel.find_element_by_xpath(".//span[@class='tittre_hotel']/a").get_attribute("href")
    urls.append(link)

for url in urls:
    driver.get(url)
       
    def existsElement(xpath):
        try:
            driver.find_element_by_id(xpath);
        except NoSuchElementException:
            return "false"
        else:
            return "true"
   
    if (existsElement('result_par_arrangement')=="false"):
   
        btn_t = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="moteur_rech"]/form/div/div[3]/div')))

        btn_t.click()
        sleep(10)
    else :
        pass
               
    
    try:
        name = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='bloc_titre_hotels']/h2"))).text
        arropt = driver.find_element_by_xpath("//div[contains(@class,'line_result')][1]")
        opt = arropt.find_element_by_tag_name("b").text
        num = len(arropt.find_elements_by_tag_name("option"))
        optiondata = 
        achats = 
        marges= 
        selection = Select(driver.find_element_by_id("arrangement"))

        for i in range(num):
            try:
                selection = Select(driver.find_element_by_id("arrangement"))
                selection.select_by_index(i)
                time.sleep(2)

                arr = driver.find_element_by_xpath("//select[@id='arrangement']/option[@selected='selected']").text
                prize = driver.find_element_by_id("prix_total").text

                optiondata[arr] = (int(prize))

                btn_passe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="resultat"]/div/form/div/div[2]/div[1]/div[2]/div[2]/div')))
                btn_passe.click()



                # params to select
                params = 
                            'civilite_acheteur': 'Mlle',
                            'prenom_acheteur': 'test',
                            'nom_acheteur': 'test',
                            'e_mail_acheteur': 'test@gmail.com',
                            'portable_acheteur': '22222222',
                            'ville_acheteur': 'Test',
                        

                # select civilite
                civilite_acheteur = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, 'civilite_acheteur'))))
                civilite_acheteur.select_by_value(params['civilite_acheteur'])

                # saisir prenom 
                script  = f"document.getElementsByName('prenom_acheteur')[0].value ='params['prenom_acheteur']';"
                script += f"document.getElementsByName('nom_acheteur')[0].value ='params['nom_acheteur']';"
                script += f"document.getElementsByName('e_mail_acheteur')[0].value ='params['e_mail_acheteur']';"
                script += f"document.getElementsByName('portable_acheteur')[0].value ='params['portable_acheteur']';"
                script += f"document.getElementsByName('ville_acheteur')[0].value ='params['ville_acheteur']';"
                driver.execute_script(script)

                # submit form
                btn_agence = driver.find_element_by_id('titre_Nabeul')
                btn_agence.click()

                btn_continuez = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'boutonr')))
                btn_continuez.click()

                achat = int(driver.find_element_by_xpath('/html/body/header/div[2]/div[1]/div[1]/div[4]/div[2]/div[2]').text.replace(' TND', ''))

                achats[arr]=achat

                marge =int(((float(prize) - float(achat)) / float(achat)) * 100);
                marges[arr]=marge
                optiondata[arr]=prize,achat,marge
                
                
                driver.get(url)
                btn_display = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="moteur_rech"]/form/div/div[3]/div')))

                btn_display.click()
                sleep(10)
               

            except StaleElementReferenceException:
                pass

            

    except NoSuchElementException:
        pass
    
  s="-  |  : ".format(name, opt, optiondata)
    print(s)  
   

    ds = []

    for l in s.splitlines():
        d = l.split('-')
        if len(d) > 1:
            df = pd.DataFrame(ast.literal_eval(d[1].strip()))
            ds.append(df)

    for df in ds:
        df.reset_index(drop=True, inplace=True)

    df = pd.concat(ds, axis= 1)

    cols = df.columns

    cols = [((col.split('.')[0], col)) for col in df.columns]

    df.columns=pd.MultiIndex.from_tuples(cols)

    print(df.T)    

#print(" :  - ".format(name, opt, optiondata))

代码 2

from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import StaleElementReferenceException,NoSuchElementException
urls = []
hotels = driver.find_elements_by_xpath("//div[starts-with(@id,'produit_affair')]")
for hotel in hotels:
    link = hotel.find_element_by_xpath(".//span[@class='tittre_hotel']/a").get_attribute("href")
    urls.append(link)
for url in urls:
    driver.get(url)
    try:
        name = driver.find_element_by_xpath("//div[@class='bloc_titre_hotels']/h2").text
        arropt = driver.find_element_by_xpath("//div[contains(@class,'line_result')][1]")
        opt = arropt.find_element_by_tag_name("b").text
        num = len(arropt.find_elements_by_tag_name("option"))
        optiondata = 
        selection = Select(driver.find_element_by_id("arrangement"))
        for i in range(num):
            try:
                selection = Select(driver.find_element_by_id("arrangement"))
                selection.select_by_index(i)
                time.sleep(2)
                arr = driver.find_element_by_xpath("//select[@id='arrangement']/option[@selected='selected']").text
                prize = driver.find_element_by_id("prix_total").text
                optiondata[arr]=prize
            except StaleElementReferenceException:
                pass
    except NoSuchElementException:
        pass
    print(" :  -  - ".format(name,opt,num,optiondata))

【问题讨论】:

【参考方案1】:
    您的代码已过时。 HTML 已更改/更新,页面上不再存在标识为 boutonr 的元素。 您的循环和执行顺序有误,因此这会使代码评估的字段仍然相同。 您不应使用或至少将time.sleep() 的使用降至最低,因为这会浪费您的代码执行时间。请改用WebDriverWait(...)

我不会说法语,所以我无法理解您在代码中所追求的内容,但下面这个最小化的示例应该可以帮助您理解原理。

#!/usr/bin/env python
# coding: utf-8
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome("C:\chromedriver.exe")
driver.get('https://tn.tunisiebooking.com/')

# params to select
params =   'destination': 'Nabeul',
            'date_from': '25/08/2021',
            'date_to': '26/08/2021',
            'bedroom': '1' 

# select destination
destination_select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, 'ville_des'))))
destination_select.select_by_value(params['destination'])

# select bedroom
bedroom_select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'select_ch'))))
bedroom_select.select_by_value(params['bedroom'])

# select dates
script = f"document.getElementById('checkin').value ='params['date_from']';"
script += f"document.getElementById('checkout').value ='params['date_to']';"
script +=  f"document.getElementById('depart').value ='params['date_from']';"
script += f"document.getElementById('arrivee').value ='params['date_to']';"
driver.execute_script(script)

# submit form
btn_rechercher = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//div[@onclick="return submit_hotel_recherche()"]')))
btn_rechercher.click()

urls = []
hotels = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'produit_affair')]")))

for hotel in hotels:
    link = hotel.find_element_by_xpath(".//span[@class='tittre_hotel']/a").get_attribute("href")
    urls.append(link)

for url in urls:
    driver.get(url)
    try:
        name = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='bloc_titre_hotels']/h2"))).text
        arropt = driver.find_element_by_xpath("//div[contains(@class,'line_result')][1]")
        opt = arropt.find_element_by_tag_name("b").text
        num = len(arropt.find_elements_by_tag_name("option"))
        optiondata = 
        achats = 
        marges= 

        for i in range(num):
            try:
                selection = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'arrangement')))).select_by_index(i)
                time.sleep(0.5)

                arr = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//select[@id='arrangement']/option[@selected='selected']"))).text
                prize = driver.find_element_by_id("prix_total").text

                optiondata[arr] = int(prize)

            except StaleElementReferenceException:
                pass

        print(" :  - ".format(name, opt, optiondata))

    except NoSuchElementException:
        pass

driver.quit()

结果:

Byzance Nabeul : Chambre Double - 'All Inclusive soft': 93, 'Demi Pension': 38, 'Petit Dejeuner': 28, 'Pension Complete': 78
Palmyra Club Nabeul Nabeul : Double Standard - 'All Inclusive soft': 92

以下代码进入支付页面并提取那里的所有信息:

#!/usr/bin/env python
# coding: utf-8
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait, Select
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome("/usr/local/bin/chromedriver")
driver.get('https://tn.tunisiebooking.com/')

# params to select
params = 
    'destination': 'Nabeul',
    'date_from': '29/08/2021',
    'date_to': '30/08/2021',
    'bedroom': '1'


# select destination
destination_select = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, 'ville_des'))))
destination_select.select_by_value(params['destination'])

# select bedroom
bedroom_select = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'select_ch'))))
bedroom_select.select_by_value(params['bedroom'])

# select dates
script = f"document.getElementById('checkin').value ='params['date_from']';"
script += f"document.getElementById('checkout').value ='params['date_to']';"
script +=  f"document.getElementById('depart').value ='params['date_from']';"
script += f"document.getElementById('arrivee').value ='params['date_to']';"
driver.execute_script(script)

# submit form
btn_rechercher = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//div[@onclick="return submit_hotel_recherche()"]')))
btn_rechercher.click()

urls = []
hotels = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, "//div[starts-with(@id,'produit_affair')]")))

for hotel in hotels:
    link = hotel.find_element_by_xpath(".//span[@class='tittre_hotel']/a").get_attribute("href")
    urls.append(link)

for url in urls:
    driver.get(url)
    try:
        name = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='bloc_titre_hotels']/h2"))).text
        arropt = driver.find_element_by_xpath("//div[contains(@class,'line_result')][1]")
        opt = arropt.find_element_by_tag_name("b").text
        num = len(arropt.find_elements_by_tag_name("option"))
        optiondata = 
        achats = 
        marges= 
        try:
            selection = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'arrangement'))))
            time.sleep(0.5)

            arr = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//select[@id='arrangement']/option[@selected='selected']"))).text
            prize = driver.find_element_by_id("prix_total").text

            optiondata[arr] = (int(prize))

            btn_passe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'resa')))
            btn_passe.click()

            tot = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'montant_total_apres_code')))
            total = int(tot.text.replace(' €', ''))

            # params to select
            params = 
                        'civilite_acheteur': 'Mlle',
                        'prenom_acheteur': 'test',
                        'nom_acheteur': 'test',
                        'e_mail_acheteur': 'test@gmail.com',
                        'portable_acheteur': '22222222',
                        'ville_acheteur': 'Test',
                    

            # select civilite
            civilite_acheteur = Select(WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, 'civilite_acheteur'))))
            civilite_acheteur.select_by_value(params['civilite_acheteur'])

            # saisir prenom 
            script  = f"document.getElementsByName('prenom_acheteur')[0].value ='params['prenom_acheteur']';"
            script += f"document.getElementsByName('nom_acheteur')[0].value ='params['nom_acheteur']';"
            script += f"document.getElementsByName('e_mail_acheteur')[0].value ='params['e_mail_acheteur']';"
            script += f"document.getElementsByName('portable_acheteur')[0].value ='params['portable_acheteur']';"
            script += f"document.getElementsByName('ville_acheteur')[0].value ='params['ville_acheteur']';"
            driver.execute_script(script)

            # submit form
            btn_agence = driver.find_element_by_class_name('continuez_resa')
            btn_agence.click()
            
            achat1 = int(driver.find_element_by_id('montant_a_payer').text.replace(' €', ''))
            achat = int(driver.find_element_by_id('montant_restant').text.replace(' €', ''))
            achat3 = float(driver.find_element_by_xpath('//div[@class="ligne_interne_total"]/div[3]/div[@class="prix_total1 text_shadow"]').text.replace(' TND', ''))
            achats[arr]=achat

            marge =int(((float(prize) - float(achat)) / float(achat)) * 100);
            marges[arr]=marge
            optiondata[arr]=prize,total,achat1,achat,achat3,marge

        except StaleElementReferenceException:
            pass

        print(" :  - ".format(name, opt, optiondata))

    except NoSuchElementException:
        pass
    
driver.quit()

输出:

Byzance Nabeul : Chambre Double - 'Petit Dejeuner': (36, 41, 12, 29, 4.0, 24)

地点:

36 = Prix Total
41 = Montant Total
12 = Montant de l'acompte
29 = Vous payerez le reste à votre arrivée à l'hôtel
4.0 = Total taxe de séjour à payer sur place à l'hôtel est
24 = Marges

酒店页面:

【讨论】:

谢谢你的回复,这里有更多细节的问题,我希望你能理解我需要用第二个代码做什么link我需要提取的最重要的数据是“achat”和“玛吉” 什么是“achat”?在最后一个付款页面上有 3 个价格:1) Montant de l'acompte, 2) Montant à payer à votre arrivée à l'hôtel, 3) Total taxe de séjour à payer sur place à l'hôtel est。 “阿查特”? 这是“Montant à payer à votre arrivée à l'hôtel” 还是有问题 :( 没用但是还是谢谢你 上面的代码产生了您要求的上述输出,所以什么不起作用?【参考方案2】:

您在第一个示例中使用 sleeps 来加载页面,但在第二个示例中没有(您声明的那个可以正常工作)。

这通常不是您实际使用 selenium 的方式,这让我相信您的时机不对。

这个SO answer 向您展示了如何在“expected_conditions”上使用“显式等待”来避免可能/将会失败的“特定时间”。

您甚至创建了一个 wait 对象,但从不使用它。

将它与expected_conditions 结合使用并删除特定的定时睡眠,事情会变得更好。

expected_conditions docs are here

【讨论】:

【参考方案3】:

问题是它无法访问列表中其余酒店的元素列表安排我添加了一个测试数据存在的函数并且它工作正常

for url in urls:
    driver.get(url)
       
    def existsElement(xpath):
        try:
            driver.find_element_by_id(xpath);
        except NoSuchElementException:
            return "false"
        else:
            return "true"
   
    if (existsElement('result_par_arrangement')=="false"):
   
        btn_t = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="moteur_rech"]/form/div/div[3]/div')))

        btn_t.click()
    else :
        pass

【讨论】:

以上是关于为啥我的 Python 代码为列表中的所有元素提取相同的数据?的主要内容,如果未能解决你的问题,请参考以下文章

python提取list中的元素

为啥我的数组列表只更改一个元素中的数据?

python数组中怎样删除符合条件的元素

将 SQL 表中的所有数据提取到我的 Python 代码中

在python中建立一组列表后,怎么从列表中提取元素

python把列表前几个元素提取到新列表