Twitter Data:有没有办法根据条件进行拆分?

Posted

技术标签:

【中文标题】Twitter Data:有没有办法根据条件进行拆分?【英文标题】:Twitter Data: Is there a way to split based on a condition? 【发布时间】:2020-06-10 17:11:10 【问题描述】:

代码截图

#Date Bool
def isDate(string):
    elem = []
    splits = string.split()
    for element in splits:
        elem.append(element)
    if len(elem) > 5:
        return True if elem[2].isdigit() else False
    else:
        return False

#LOAD HANDLER
def loader(file):
    lines = []
    with open(file,encoding='utf8') as f:
        for line in f:
            lines.append(line)
    return lines

class define:
    def __init__(self, date, token, tweet):
            self.date = date
            self.token = token
            self.tweet = tweet

数据截图

免责声明:这些推文是公共信息。这纯粹是教育研究,不反映该机构或其内部人员的任何形象

Tue Feb 04 12:36:05 EST 2020|@WishYouWereMe__|RT @coriyonmarie: I’ll never forget how somebody did me.
Tue Feb 04 12:36:05 EST 2020|@c1Leonn|RT @nxlimaa: WHY am i incapable of doing natural makeup?????? why does everything always escalate ?????????
Tue Feb 04 12:36:05 EST 2020|@Oootentog|@staydilated13 Thank youuuu! ♥️
Tue Feb 04 12:36:05 EST 2020|@SushreeRonali|@GautamGambhir Jai Hind ????????
Tue Feb 04 12:36:05 EST 2020|@Tank9trACE|4 months old at that
Tue Feb 04 12:36:05 EST 2020|@mathewpoptartm|RT @Flashyasf: Aye be careful who you catch feelings for, Shit don't be real onna other side ????
Tue Feb 04 12:36:05 EST 2020|@wakemeup0320|RT @NookNickn_r: Good night na~ ????????❤️???????? [LINK]
Tue Feb 04 12:36:05 EST 2020|@AkanniTheKing|@KiKardashiann We Got You ????????????????
Tue Feb 04 12:36:05 EST 2020|@nuggythebear|@MarcusRashford Sheryar is a strong Mancunian name. Heralds back to the Sheryars of the 1700's.
Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT 

The Road Marshall speaks ‼️⚠️‼️ [LINK]
Tue Feb 04 12:36:05 EST 2020|@blushkths|how much do i need to pay for jungkook to step on my neck

理论

所以我的想法是根据该行的第一个元素是否是日期进行拆分,并且函数 isdate() 执行此操作,但我不确定如何将前一个元素附加到当前元素以便加入项目?不知道这有多好理解,但我试图说明它:

Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT 

The Road Marshall speaks ‼️⚠️‼️ [LINK]

所以在这个片段中,我们看到推文有多行,我需要一种方法将这两行连接在一起,以便我可以对其进行操作。因此,如果加入,这将类似于:

['Tue Feb 04 12:36:05 EST 2020|@Iam_Adrii|RT @iRealPedro: PUBLIC @TANNEDja ANNOUNCEMENT The Road Marshall speaks ‼️⚠️‼️ [LINK]']

没有 \n 或类似的,所以我不确定如何继续。 最终我会将其放入字典中,但我需要先弄清楚基本原理。

【问题讨论】:

难道我们看到这条推文有多行没有\n或类似的相互矛盾吗?你能澄清一下吗? 【参考方案1】:

我建议首先像这样重写你的函数:

def isDate(string):
    splits = string.split(maxsplit=3)
    return len(splits) > 3 and splits[2].isdigit()

然后以这种方式使用它:

def loader(file):
    lines = []
    with open(file,encoding='utf8') as f:
        for line_with_newline in f:
            line = line_with_newline.rstrip()
            if isDate(line):
                lines.append(line)
            else:
                lines[-1] += line
    return lines

【讨论】:

以上是关于Twitter Data:有没有办法根据条件进行拆分?的主要内容,如果未能解决你的问题,请参考以下文章

有没有办法计算Twitter用户的影响力?

有没有办法使用正则表达式进行条件替换功能

Flutter Forms:有没有办法根据某些条件更改输入文本的颜色?

有没有办法根据条件在新页面中显示特定的内联?

检查是否存在推特用户名

iOS:有没有办法根据点击的表格视图单元格使用条件转场