Tweepy：现在可以使用 Twitter 搜索 API 获取旧推文？

Posted 2023-02-23

技术标签:

【中文标题】Tweepy：现在可以使用 Twitter 搜索 API 获取旧推文？【英文标题】：Tweepy: get old tweets now possible with Twitter search api? 【发布时间】：2015-03-31 19:16:12 【问题描述】：

根据http://www.theverge.com/2014/11/18/7242477/twitter-search-now-lets-you-find-any-tweet-ever-sent Twitter 搜索现在可让您找到任何已发送的推文。

但是当我尝试使用 tweepy 从 2014 年到 2015 年获取推文时，它只会获得最近的：

    query = 'Nivea'
    max_tweets = 1000
    searched_tweets = [json.loads(status.json) for status in tweepy.Cursor(api.search,
                                                                           q=query,
                                                                           count=100,
                                                                           #since_id="24012619984051000",
                                                                           since="2014-02-01",
                                                                           until="2015-02-01",
                                                                           result_type="mixed",
                                                                           lang="en"
                                                                           ).items(max_tweets)]

我试过 since="2014-02-01" 和 since_id 但没关系。

【问题讨论】：

Twitter API 有一些限制，这个库解决了问题，看看：github.com/Jefferson-Henrique/GetOldTweets-python 【参考方案1】：

很遗憾，您无法从 Twitter 访问过去的数据。不是您使用什么库的问题：Tweepy、Twitter4J 等等，只是 Twitter 不会提供任何超过或少于 2 周的数据。

要获取历史数据，您需要直接通过 Twitter 或 GNIP 等第三方经销商访问 firehose。

【讨论】：

【参考方案2】：

我使用我自己的一段代码，它使用 HttpURLConnection 和 twitter 搜索 url。然后，我使用正则表达式提取最后 20 条匹配的推文……幸运的是，当我删除推文时，我可以简单地再次搜索，直到找不到更多推文。我包括了代码，虽然它是用 Java 编写的，但同样适用于任何语言。首先，我使用一个类来实际搜索推文并记录其详细信息：

public class ReadSearch
    private String startURL = "https://twitter.com/search?f=realtime&q=from%3A";
    private String middleURL = "%20%40";
    private String endURL = "&src=typd";

    public ArrayList<Tweet> getTweets(String user, String troll) 
        ArrayList<Tweet> tweets = new ArrayList<Tweet>();
        String expr = "small.class=\"time\".*?href=\"/"
                + "([^/]+)"
                + ".*?status/"
                + "([^\"]+)"
                + ".*?title=\""
                + "([^\"]+)";
        Pattern patt = Pattern.compile(expr, Pattern.DOTALL | Pattern.UNIX_LINES);
        try 
            Matcher m = patt.matcher(getData(startURL+user+middleURL+troll+endURL));
            while (m.find()) 
                if(user.equals(m.group(1).trim()))
                    Tweet tw = new Tweet();
                    tw.setUser(m.group(1).trim());
                    tw.setTweetid(Long.parseLong(m.group(2).trim()));
                    tw.setDate(m.group(3).trim());
                    tweets.add(tw);
                
            
         catch (Exception e) 
            e.printStackTrace();
            System.out.println("Exception " + e);
        
        return tweets;
    

    private StringBuilder getData(String dataurl) throws MalformedURLException, IOException
        URL url = new URL(dataurl);
        HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
        httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
        StringBuilder sb = new StringBuilder(16384);
        BufferedReader br = new BufferedReader(new InputStreamReader(httpcon.getInputStream(), "ISO-8859-1"));
        String line;
        while ((line = br.readLine()) != null)
            sb.append(line);
            sb.append('\n');
        
        httpcon.disconnect();
        br.close();
        return sb;
    

    public static void main(String [] args)
        //testing
        ReadSearch rs = new ReadSearch();
        ArrayList<Tweet> tweets = rs.getTweets("Tony_Kennah", "PickLuckier");
        for(Tweet t : tweets)
            System.out.println("TWEET: " + t.toString());

然后我们需要 Tweet 类本身，这样我们就可以将 Tweets 分组并用它们做事，它只是一个像这样的 bean：

public class Tweet 
    private String user;
    private long tweetid;
    private String date;

    public String getUser()
        return user;
    
    public void setUser(String user)
        this.user = user;
    
    public long getTweetid()
        return tweetid;
    
    public void setTweetid(long tweetid)
        this.tweetid = tweetid;
    
    public String getDate()
        return date;
    
    public void setDate(String date)
        this.date = date;
    
    public String toString()
        return this.tweetid + " " + this.user + " " + this.date;

...所以这只是标准的java。为了使用上面的代码，我使用 Twitter4J API 并这样做：

public class DeleteTweets

    public static void main(String args[]) throws Exception
    
        Twitter twitter = TwitterFactory.getSingleton();
        ArrayList<Tweet> tweets = new ArrayList<Tweet>();
        String [] people =  "PickLuckier" ;
        for(String s : people)
            do
                ReadSearch rs = new ReadSearch();
                tweets = rs.getTweets(twitter.getScreenName(), s);
                for(Tweet tw : tweets)
                    twitter.destroyStatus(tw.getTweetid());
                
             while(tweets.size()!=0);

就是这样。我不使用 cmets，但我希望很容易看到发生了什么，这对您有所帮助。

【讨论】：

以上是关于Tweepy：现在可以使用 Twitter 搜索 API 获取旧推文？的主要内容，如果未能解决你的问题，请参考以下文章