urlopen和BeautifulSoup
Posted petitherisson
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了urlopen和BeautifulSoup相关的知识,希望对你有一定的参考价值。
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
print(html.read())
output
b‘<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n‘
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://pythonscraping.com/pages/page1.html")
bsObj = BeautifulSoup(html.read())
print(bsObj.h1)
output
<h1>An Interesting Title</h1>
2019-10-08
18:01:59
以上是关于urlopen和BeautifulSoup的主要内容,如果未能解决你的问题,请参考以下文章
python Python模块:urllib:request,urlopen和gzip
urllib2.urlopen() 与 urllib.urlopen() - urllib2 在 urllib 工作时抛出 404!为啥?
带有身份验证的 urllib.request.urlopen(url)
[Python系列-19]:爬虫 - urllib.request.urlopen()和urllib.request.get()的使用区别