python Twitter下载

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python Twitter下载相关的知识,希望对你有一定的参考价值。

#!/usr/bin/python
import sys, urllib, re, json, socket, string
from bs4 import BeautifulSoup
socket.setdefaulttimeout(20)
item_dict = {}
try:
    for line in open(sys.argv[1]):
        fields = line.rstrip().split('\t')
        tweetid = fields[0]
        userid = fields[1]
        tweet = None
        text = "Not Available"
        if item_dict.has_key(tweetid):
            text = item_dict[tweetid]
        else:
            try:
                f = urllib.urlopen('http://twitter.com/'+str(userid)+'/status/'+str(tweetid))
                html = f.read().replace("</html>", "") + "</html>"
                soup = BeautifulSoup(html)
                jstt   = soup.find_all("p", "js-tweet-text")
                tweets = list(set([x.get_text() for x in jstt]))
                if(len(tweets)) > 1:
			other_tweets = []
			cont   = soup.find_all("div", "content")
			for i in cont:
				o_t = i.find_all("p","js-tweet-text")
				other_text = list(set([x.get_text() for x in o_t]))
				other_tweets += other_text					
			tweets = list(set(tweets)-set(other_tweets))
			#print 'Other tweets\n'			
			#print other_tweets                
		        #print tweets
			#print '\n'        
			#continue
                text = tweets[0]
                item_dict[tweetid] = tweets[0]
                for j in soup.find_all("input", "json-data", id="init-data"):
                    js = json.loads(j['value'])
                    if(js.has_key("embedData")):
                        tweet = js["embedData"]["status"]
                        text  = js["embedData"]["status"]["text"]
                        item_dict[tweetid] = text
                        break
            except Exception:
                continue
    
        if(tweet != None and tweet["id_str"] != tweetid):
                text = "This tweet has been removed or is not available"
                item_dict[tweetid] = "This tweet has been removed or is not available"
        text = text.replace('\n', ' ',)
        text = re.sub(r'\s+', ' ', text)
        print "\t".join(fields + [text]).encode('utf-8')
except IndexError:
    print 'Incorrect arguments specified (may be you didn\'t specify any arguments..'
    print 'Format: python [scriptname] [inputfilename] > [outputfilename]'

以上是关于python Twitter下载的主要内容,如果未能解决你的问题,请参考以下文章

将大型 Twitter JSON 数据 (7GB+) 加载到 Python 中

Python - Twython api 的问题

Twitter视频下载方式

从 twitter 找出 Flume 下载的推文的位置

如何下载Twitter视频?最简单的保存推特视频的方法

带有 Python 2.6.7 的 Mac OS X 10.6.7 中 twitter.py 文件(python-twitter 库)的位置