如何创建一个持续检测列表中抓取的数据是不是更改的while循环
Posted
技术标签:
【中文标题】如何创建一个持续检测列表中抓取的数据是不是更改的while循环【英文标题】:How to create a while loop that continuously detects if scraped data in a list changes如何创建一个持续检测列表中抓取的数据是否更改的while循环 【发布时间】:2021-11-24 03:35:16 【问题描述】:import time
from bs4 import BeautifulSoup
import requests
from urllib.request import Request, urlopen
pages = ["movies", "series"]
printed = []
for page in pages:
req = Request("https://www.thenetnaija.com/videos/" + page, headers='User-Agent': 'XYZ/3.0')
webpage = urlopen(req, timeout=10)
b4 = BeautifulSoup(webpage, "html.parser")
movie_list = b4.find_all("div", "class" : "video-files")
for allContainers in movie_list:
filmName = allContainers.find('img').get('alt')
printed.append(filmName)
print(printed)
for get in printed:
requests.get("https://api.telegram.org/bot:AAEapVykIXdphGYaH5ZjXuhpFaFw7wpi5Bs/sendMessage?chat_id=&text=".format(get))
我想使用 while 循环让程序无限运行,并且仅在列表中的数据发生更改时才将请求发送到我的电报聊天。
【问题讨论】:
【参考方案1】:您可以以此示例为基础如何定期检查电影/连续剧(该示例使用set.difference
来确定是否有变化):
import time
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
def get_movies(url):
headers = "User-Agent": "XYZ/3.0"
req = Request(url, headers=headers)
b4 = BeautifulSoup(urlopen(req, timeout=10), "html.parser")
return set(a.get_text(strip=True) for a in b4.select("h2 a"))
url = "https://www.thenetnaija.com/videos/"
pages =
"movies": get_movies(url.format("movies")),
"series": get_movies(url.format("series")),
while True:
time.sleep(10) # <-- sleep 10sec before checking again
for k, v in pages.items():
new_movies = get_movies(url.format(k))
difference = new_movies.difference(v)
if difference:
print("New :".format(k))
print(difference)
pages[k] = new_movies
# do stuff here (post to telegram etc.)
# ...
else:
print("No new ".format(k))
【讨论】:
以上是关于如何创建一个持续检测列表中抓取的数据是不是更改的while循环的主要内容,如果未能解决你的问题,请参考以下文章