CS109 Lecture 7

Posted 2022-11-25 ZJun310

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了CS109 Lecture 7相关的知识，希望对你有一定的参考价值。

CS109 Lecture 7

Data Scraping

Sources

From a Web Sites
With An API

Copyrights and permission

Be careful and polite
Give credit
Care about media law
Don’t be evil

Useful tags

<h1></h1>
<p></p>
<br>
<a href = 'url'>Link</a>

Useful Libraries for Scraping

urllib
beautifulsoup
pattern
LXML

Get Data From Website

url = 'url'
scource = urllib2.urlopen(url).read()

soup = bs4.BeautifulSoup(source)
soup.findAll('a') # find <a><\\a> tag

tag = soup.find('a')
tag.get('href')

C = soup.findAll('p','class':'Event')
t=C[0] 
t.findNextSiblings

Get Data With An API

import json # javascript Obejct Notation
import requests
api_key = 'mykey'
url = 'url' + api_key
scource = urllib2.urlopen(url).read()

#---simple example--------
a = 'a':1,'b':2
s = json.dump(a) 
a2 = json.loads(s) 
#-------------------------
dataDict = json.loads(data)
dtatDict.keys()

以上是关于CS109 Lecture 7的主要内容，如果未能解决你的问题，请参考以下文章