CS109 Lecture 7
Posted ZJun310
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CS109 Lecture 7相关的知识,希望对你有一定的参考价值。
CS109 Lecture 7
Data Scraping
Sources
- From a Web Sites
- With An API
Copyrights and permission
- Be careful and polite
- Give credit
- Care about media law
- Don’t be evil
Useful tags
<h1></h1>
<p></p>
<br>
<a href = 'url'>Link</a>
Useful Libraries for Scraping
- urllib
- beautifulsoup
- pattern
- LXML
Get Data From Website
url = 'url'
scource = urllib2.urlopen(url).read()
soup = bs4.BeautifulSoup(source)
soup.findAll('a') # find <a><\\a> tag
tag = soup.find('a')
tag.get('href')
C = soup.findAll('p','class':'Event')
t=C[0]
t.findNextSiblings
Get Data With An API
import json # javascript Obejct Notation
import requests
api_key = 'mykey'
url = 'url' + api_key
scource = urllib2.urlopen(url).read()
#---simple example--------
a = 'a':1,'b':2
s = json.dump(a)
a2 = json.loads(s)
#-------------------------
dataDict = json.loads(data)
dtatDict.keys()
以上是关于CS109 Lecture 7的主要内容,如果未能解决你的问题,请参考以下文章