用requests库和BeautifulSoup4库爬取新闻列表

Posted 2020-10-08 ELsky

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了用requests库和BeautifulSoup4库爬取新闻列表相关的知识，希望对你有一定的参考价值。

1.用requests库和BeautifulSoup4库，爬取校园新闻列表的时间、标题、链接、来源

import requests
from bs4 import BeautifulSoup
a=requests.get(\'http://news.gzcc.cn/html/xiaoyuanxinwen/\')
a.encoding=\'utf-8\'
soup=BeautifulSoup(a.text,\'html.parser\')
for xinwen in soup.select(\'li\'):
    if len(xinwen.select(\'.news-list-description\'))>0:
        time = xinwen.select(\'.news-list-info\')[0].contents[0].text
        title = xinwen.select(\'.news-list-description\')[0].text
        source = xinwen.select(\'.news-list-description\')[0].text
        url = xinwen.select(\'a\')[0][\'href\']
        print(time,title,source,url)

2.选一个自己感兴趣的主题，做类似的操作，为“爬取网络数据并进行文本分析”做准备

import requests
from bs4 import BeautifulSoup
from datetime import datetime

a=requests.get(\'http://news.gzcc.cn/html/jxky/\')
a.encoding=\'utf-8\'
soup=BeautifulSoup(a.text,\'html.parser\')

def getdetail(url):
    resd = requests.get(url)
    resd.encoding=\'utf-8\'
    soupd = BeautifulSoup(resd.text,\'html.parser\')
    return (soupd.select(\'.show-content\')[0].text)
for xinwen in soup.select(\'li\'):
    if len(xinwen.select(\'.news-list-description\'))>0:
        time = xinwen.select(\'.news-list-info\')[0].contents[0].text
        dt = datetime.strptime(time,\'%Y-%m-%d\')
        title = xinwen.select(\'.news-list-description\')[0].text
        source = xinwen.select(\'.news-list-description\')[0].text
        url = xinwen.select(\'a\')[0][\'href\']
        detail = getdetail(url)
        print(time,title,source,url,detail)
        break

以上是关于用requests库和BeautifulSoup4库爬取新闻列表的主要内容，如果未能解决你的问题，请参考以下文章