如何使用python删除功能从具有404状态代码的文件中删除网址?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何使用python删除功能从具有404状态代码的文件中删除网址?相关的知识,希望对你有一定的参考价值。
我必须使用python remove功能从状态为404的文件中删除网址。但是我不确定为什么它不起作用。
代码:
#!/usr/bin/python
import requests
url_lines = open('url.txt').read().splitlines()
for url in url_lines:
remove_url = requests.get(url)
if remove_url.status_code == 404:
print remove_url.status_code
url_lines.remove(url)
url.txt文件包含以下几行:
https://www.amazon.co.uk/jksdkkhsdhk
http://www.google.com
应该从url.txt文件中删除https://www.amazon.co.uk/jksdkkhsdhk行。
非常感谢您的帮助。
答案
我用自我解释的评论写了这篇文章希望对您有帮助
import requests
with open('url.txt') as file: # better file handling using context manager
good_guys = []
for line in file: # simpler way rather than for line in file.read().splitlines()
url = line.strip() # get a rid of line endings and spaces
# the pythonic way is to use try ... execpt block
try:
if not url:
continue # skip empty url
resp = requests.get(url)
if resp.status_code != 404:
good_guys.append(line)
except Exception as e:
print('error while handling url,', e)
# save your new urls to new file named url2.txt, you still can overwrite same input file but
# this is not a good practice, if you made a mistake you will still have your original file
with open('url2.txt', 'w') as file:
file.writelines(good_guys)
另一答案
您可以跳过它:
if remove_url.status_code == 404:
continue
您不应该在for
循环中尝试删除它。而是将其添加到另一个列表remove_from_urls
,然后在for
循环之后,删除新列表中的所有索引。这可以通过以下方式完成:
remove_from_urls = []
for url in url_lines:
remove_url = requests.get(url)
if remove_url.status_code == 404:
remove_from_urls.append(remove_url)
continue
# Code for handling non-404 requests
url_lines = [url for url in url_lines if url not in remove_from_urls]
# Save urls example
with open('urls.txt', 'w+') as file:
for item in url_lines:
file.write(item + '\n')
以上是关于如何使用python删除功能从具有404状态代码的文件中删除网址?的主要内容,如果未能解决你的问题,请参考以下文章
MVC Web API:预检响应具有无效的 HTTP 状态代码 404
预检具有无效的 HTTP 状态代码 404 Jquery AJAX POST
(已解决):-)(React 和 Django)我无法从我的用户列表中删除用户。 (HTTP 状态码 404)