如何将编号列表切片为子列表

Posted

技术标签:

【中文标题】如何将编号列表切片为子列表【英文标题】:How to slice numbered lists into sublists 【发布时间】:2014-01-16 13:33:51 【问题描述】:

我打开了一个文件,并使用readlines()split() 和正则表达式'\t' 来删除TAB,结果如下:

["1", "cats", "--,"]
["2", "chase", "--,"]
["3", "dogs", "--,"]
["1", "the", "--,"]
["2", "car", "--,"]
["3", "is", "--,"]
["4", "gray", "--,"]

现在我想通过将索引 [0] 上的整数循环为句子边界,将其提取并切成子列表,例如“猫追狗”和“汽车是灰色的”。例如,1 - 3 子列表“猫追狗”,然后继续计数 1 - 4 子列表“汽车是灰色的”,其余列表以此类推,所以我得到子列表 ["the", "car", "is", "gray" ]。我该怎么做?

我已经尝试过了,但出现错误:

无法连接 int + str

在 for 循环中将“i”检测为字符串元素而不是整数:

with open(buffer, 'r') as f:
    words = []
    for line in f:
        items = line.split('\t')[:1]
        for i in items:
            while i>1:
                i = i+1
                print i

【问题讨论】:

你尝试了什么?你说你想提取和切片“通过循环索引[0]上的整数”等等,但还没有尝试过? 我尝试使用 while 循环获取索引位置 0 上的所有数字,使其循环 1-3,然后继续计数 1-4,依此类推,但没有得到切片, 比如先得到 1-3 并 sublist "catsches dogs" 应该继续数 1-4 并 sublist "the car is gray" 等等。 然后把它放在你的问题中!如果你这样做,你更有可能得到答案。 从文件中读取“1”时,为字符串类型。您需要使用int(i) 将其转为int。 我试过了,for i in items: l = int(i) print l 但它返回 ValueError: invalid literal for int() with base 10: '' when it is counted a list.跨度> 【参考方案1】:

选择合适的数据结构使工作更容易:

container = [["1", "cats", "--,"],
             ["2", "chase", "--,"],
             ["3", "dogs", "--,"],
             ["1", "the", "--,"],
             ["2", "car", "--,"],
             ["3", "is", "--,"],
             ["4", "gray", "--,"]]

将列表嵌套在容器列表中,然后使用字典存储输出列表:

from collections import defaultdict

out = defaultdict(list)              # Initialize dictionary for output
key = 0                              # Initialize key  

for idx, word, _ in container:       # Unpack sublists
    if int(idx) == 1:                # Check if we are at start of new sentence
        key += 1                     # Increment key for new sentence
    out[key].append(word)            # Add word to list

给予:


    1: ['cats', 'chase', 'dogs'], 
    2: ['the', 'car', 'is', 'gray']

【讨论】:

【参考方案2】:

类似:

from itertools import groupby

with open('yourfile') as fin:
    # split lines
    lines = (line.split() for line in fin)
    # group by consecutive ints
    grouped = groupby(enumerate(lines), lambda (idx, el): idx - int(el[0]))
    # build sentences from words in groups
    sentences = [' '.join(el[1][1] for el in g) for k, g in grouped]
    # ['cats chase dogs', 'the car is gray']

注意:这基于您的示例数据:

example = [
    ["1", "cats", "--,"],
    ["2", "chase", "--,"],
    ["3", "dogs", "--,"],
    ["1", "the", "--,"],
    ["2", "car", "--,"],
    ["3", "is", "--,"],
    ["4", "gray", "--,"]
]

【讨论】:

以上是关于如何将编号列表切片为子列表的主要内容,如果未能解决你的问题,请参考以下文章

Python列表操作:遍历、range()、列表解析、列表切片、列表复制、元组

列表/元组/切片/字典/字符串处理方法

python基础--字符串列表元组字典

分配如何与列表切片一起使用?

Python入门教程第32篇 列表切片

切片,元组,字典字,符串