爬取校园新闻首页的新闻
Posted 099吴海经
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬取校园新闻首页的新闻相关的知识,希望对你有一定的参考价值。
import requests from bs4 import BeautifulSoup url = \'http://news.gzcc.cn/html/xiaoyuanxinwen/\' res = requests.get(url) res.encoding = \'utf-8\' soup = BeautifulSoup(res.text, \'html.parser\') for news in soup.select(\'li\'): if len(news.select(\'.news-list-title\')) > 0: title = news.select(\'.news-list-title\')[0].text time = news.select(\'.news-list-info\')[0].contents[0].text a = news.select(\'a\')[0].attrs[\'href\'] print(a,title,time) break res1 = requests.get(a) res1.encoding = \'utf-8\' soup1 = BeautifulSoup(res1.text, \'html.parser\') sp1 = soup1.select(\'#content\')[0].text info = soup1.select(\'.show-info\')[0].text print(info) dt = info.lstrip(\'发布时间:\')[1:20] print(dt) ly = info.find(\'来源:\') if ly>0: s = info[info.find(\'来源:\'):].split()[0].lstrip(\'来源:\') print(s) ly = info.find(\'摄影:\') if ly>0: s = info[info.find(\'摄影:\'):].split()[0].lstrip(\'摄影:\') print(s) from datetime import datetime str = dt da = datetime.strptime(str,\'%Y-%m-%d %H:%M:%S\') now = datetime.now() type(now) print(now.strftime(\'%Y-%m-%d %H:%M:%S\'))
以上是关于爬取校园新闻首页的新闻的主要内容,如果未能解决你的问题,请参考以下文章