猫眼 top_100 爬取 ___只完成了第一页
Posted skyda
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了猫眼 top_100 爬取 ___只完成了第一页相关的知识,希望对你有一定的参考价值。
# python 3.7 from urllib.request import Request,urlopen import time,re,csv class Maoyan(object): def __init__(self): self.header = { ‘Connection‘: ‘keep - alive‘, ‘Cookie‘: ‘uuid_n_v=v1; uuid=16B52300EED311E8A50EC9D5D894D382A1072CB6CA3D4BAA95D7EA39B1BB3637; _lxsdk_cuid=1673eb37e1fc8-011175d5446e19-424f0928-13c680-1673eb37e20c8; _lxsdk=16B52300EED311E8A50EC9D5D894D382A1072CB6CA3D4BAA95D7EA39B1BB3637; _csrf=6597fe121a59ff12f8bf1b793cb7d29274a118e066c86f8bf88b8e765b7d4dad; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; __mta=145127947.1542945209936.1542945209936.1542954826219.2; _lxsdk_s=1673f4639ac-357-82a-15d%7C%7C4‘, ‘Host‘: ‘maoyan.com‘, ‘Referer‘: ‘http://maoyan.com/board‘, ‘Upgrade - Insecure - Requests‘: 1, ‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (Khtml, like Gecko) Chrome/70.0.3538.102 Safari/537.36‘ } def get_page(self,url): res = urlopen(Request(url =url,headers=self.header)).read() self.parsePage(res.decode()) def parsePage(self,res): patten = ‘data-val="{.*?}">(.*?)</a></p>s+<p class="star">s+(.*?)s+</p>s+<p class="releasetime">(.*?)</p>‘ a = re.findall(patten,res) self.write(a) def write(self,a): for i in a: with open(‘11.csv‘,‘a+‘,newline=‘‘,encoding=‘gbk‘) as f: a = csv.writer(f) a.writerow(list(i)) def wordon(self): pass if __name__ == ‘__main__‘: a = Maoyan() a.get_page(‘http://maoyan.com/board/4?offset=0‘)
以上是关于猫眼 top_100 爬取 ___只完成了第一页的主要内容,如果未能解决你的问题,请参考以下文章