如何在 python 列表中找到所有行的开始和结束索引

Posted 2023-03-12

技术标签:

【中文标题】如何在 python 列表中找到所有行的开始和结束索引【英文标题】：How do I find start and end indices in python list for all the rows 【发布时间】：2021-11-02 18:26:34 【问题描述】：

我的代码 -

df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
    l1.append(' '.join((df['unions'][i], df['district'][i])))
    l2.append(("entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],))

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

结果 - [('Dhansagar Bagerhat', 'entities': [[(0, 8)], 'Sarankhola'])]

我的预期输出 - [('Dhansagar Bagerhat', 'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel'])] 如何获得所有行的输出？我只得到一排的结果。看来我的循环不起作用。谁能指出我的错误？

我的 csv 文件如下所示。 “AnyLabel”是另一列。我有大约 500 行 -

unions        subdistrict   district 
Dhansagar     Sarankhola    Bagerhat 
Daibagnyahati Morrelganj    Bagerhat 
Ramchandrapur Morrelganj    Bagerhat 
Kodalia       Mollahat      Bagerhat

【问题讨论】：

向我们展示原始数据框。能否将其复制并粘贴为文本？是的，我已经添加了文本格式 【参考方案1】：

尝试使用str.join：

df=pd.read_csv("file")
l1=[]
l2=[]

for idx, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(("entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]))
    

TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)

输出：

[('Dhansagar Bagerhat', 'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]), ('Daibagnyahati Bagerhat', 'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]), ('Ramchandrapur Bagerhat', 'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]), ('Kodalia Bagerhat', 'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']])]

【讨论】：

在这里遇到同样的错误。 l1.append(' '.join((row['unions'], row['district']))) TypeError: tuple indices must be integers or slices, not str 实际上我又遇到了同样的错误。 l1.append(' '.join((row['unions'], row['district']))) TypeError: sequence item 0: expected str instance, float found 另外，你能看看我的预期输出吗？我想为这两个词都有索引它适用于另一个 csv 文件。这是联合专栏的问题。谢谢但我的预期输出不同【参考方案2】：

您使用 range 错误，您基本上是在告诉它迭代从 0 到 len(df['unions']) 的所有数字，但要以相同长度的 len(df['district']) 的步骤进行。所以你基本上是在告诉它只遍历第一行。您可以通过打印行号来查看：

for i in range(0,len(df['unions']),len(df['district'])):
    print(i)

另外，你不应该像那样迭代行，而是使用df.iterrows()

df=pd.read_csv("file")
l1=[]
l2=[]

for i, row in df.iterrows():
    l1.append(' '.join((row['unions'], row['district'])))
    l2.append(("entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]]))

【讨论】：

@U12-Forward 你能指出在哪里吗？我会解决的花括号。但是我得到这个错误 l1.append(' '.join((row['unions'], row['district']))) TypeError: tuple indices must be integers or slices,不是str @bellatrix 对不起，我忘记了i，用编辑后的版本再试一次但它是浮动的。 l1.append(' '.join((row['unions'], row['district']))) TypeError: sequence item 0: expected str instance, float found

以上是关于如何在 python 列表中找到所有行的开始和结束索引的主要内容，如果未能解决你的问题，请参考以下文章