CS109 Lecture 7

Posted ZJun310

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CS109 Lecture 7相关的知识,希望对你有一定的参考价值。

CS109 Lecture 7

Data Scraping

Sources

  • From a Web Sites
  • With An API

Copyrights and permission

  • Be careful and polite
  • Give credit
  • Care about media law
  • Don’t be evil

Useful tags

<h1></h1>
<p></p>
<br>
<a href = 'url'>Link</a>

Useful Libraries for Scraping

  • urllib
  • beautifulsoup
  • pattern
  • LXML

Get Data From Website

url = 'url'
scource = urllib2.urlopen(url).read()
soup = bs4.BeautifulSoup(source)
soup.findAll('a') # find <a><\\a> tag
tag = soup.find('a')
tag.get('href')
C = soup.findAll('p','class':'Event')
t=C[0] 
t.findNextSiblings

Get Data With An API

import json # javascript Obejct Notation
import requests
api_key = 'mykey'
url = 'url' + api_key
scource = urllib2.urlopen(url).read()
#---simple example--------
a = 'a':1,'b':2
s = json.dump(a) 
a2 = json.loads(s) 
#-------------------------
dataDict = json.loads(data)
dtatDict.keys()

以上是关于CS109 Lecture 7的主要内容,如果未能解决你的问题,请参考以下文章

CS109 Lecture 5

CS109 Lecture 2

CS109 Lecture 3

CS109 Lecture 4

Cs231n课堂内容记录-Lecture 7 神经网络二

CS3334 Lecture 1