在循环中按索引遍历列表列表,以重新格式化字符串
Posted
技术标签:
【中文标题】在循环中按索引遍历列表列表,以重新格式化字符串【英文标题】:Traversing a list of lists by index within a loop, to reformat strings 【发布时间】:2015-05-04 22:27:06 【问题描述】:我有一个看起来像这样的列表列表,它是从格式不佳的 csv 文件中提取的:
DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
我想以这样的新结构结束:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
之后我可以进一步拆分、剥离等。
所以,我使用了以下事实:
客户编号始终以Customer Number
开头
Notes
总是更长
Notes
的数量从不超过5个
编写显然是荒谬的解决方案,即使它有效。
DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []
for record in DF:
if (record[0:17]=="Customer Number: ") & (record !="stophere"):
DF2.append(record + DF[DF.index(record)+1])
if len(DF[DF.index(record)+2]) >21:
DF2.append(record + DF[DF.index(record)+2])
if len(DF[DF.index(record)+3]) >21:
DF2.append(record + DF[DF.index(record)+3])
if len(DF[DF.index(record)+4]) >21:
DF2.append(record + DF[DF.index(record)+4])
if len(DF[DF.index(record)+5]) >21:
DF2.append(record + DF[DF.index(record)+5])
有人介意为这类问题推荐一种更稳定、更智能的解决方案吗?
【问题讨论】:
【参考方案1】:只需跟踪我们何时找到新客户:
from pprint import pprint as pp
out = []
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
pp(out)
输出:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
如果客户可以稍后再重复,并且您希望他们组合在一起,请使用字典:
from collections import defaultdict
d = defaultdict(list)
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
d[cust].append(cust + sub[0])
print(d)
输出:
pp(d)
'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
'stuff and was easy to deal with'],
'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
'James Bond',
'Customer Number: 007 Notes: came in with a '
'martini'],
'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
'stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral '
'to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address '
'birthday social security number on file'],
'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
'like Chris Farley on that hidden decaf skit '
'from SNL']
根据您的评论和错误,您似乎有一行出现在实际客户之前,因此我们可以将它们添加到列表中的第一个客户:
# added ["foo"] before we see any customer
DF = [["foo"],['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
from pprint import pprint as pp
from itertools import takewhile, islice
# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))
out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start:
# so set cust to first customer in list and start adding to out
cust = DF[ln][0]
for sub in start:
out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
哪些输出:
['Customer Number: 001 foo',
'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
我假设您会认为出现在任何客户之前的行实际上属于第一个客户。
【讨论】:
@MattO'Brien,那么您的列表实际上并没有作为第一个元素Customer Number:...
,因此这意味着需要一种不同的方法。如果客户在我们还没有看到客户的情况下出现在文本之后会发生什么?即[["foo"],["Customer Number: 100"]]
是您列表的开始
啊...对不起。我犯了一个小错误。我要删除我的评论。谢谢!
@MattO'Brien,别担心,我会保留替代代码,因为它可能对某人有用。【参考方案2】:
您的基本目标是将注释分组并将其与客户相关联。而且由于列表已经排序,你可以简单地使用itertools.groupby
,像这样
from itertools import groupby, chain
def build_notes(it):
customer, func = "", lambda x: x.startswith('Customer')
for item, grp in groupby(chain.from_iterable(DF), key=func):
if item:
customer = next(grp)
else:
for note in grp:
yield customer + note
# In Python 3.x, you can simply do
# yield from (customer + note for note in grp)
在这里,我们使用chain.from_iterable
将实际的列表列表扁平化为字符串序列。然后我们将包含Customer
的行和没有的行分组。如果该行有Customer
,则item
将为True
,否则为False
。如果item
是True
,那么我们会得到客户信息,当item
是False
时,我们会遍历分组的注释并通过将客户信息与注释连接起来一次返回一个字符串。
所以,当你运行代码时,
print(list(build_notes(DF)))
你得到
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
【讨论】:
【参考方案3】:DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
custnumstr = None
out = []
for df in DF:
if df[0].startswith('Customer Number'):
custnumstr = df[0]
else:
out.append(custnumstr + df[0])
for e in out:
print e
【讨论】:
【参考方案4】:您还可以使用 OrderedDict,其中键是客户,值是注释列表:
from collections import OrderedDict
DF_dict = OrderedDict()
for subl in DF:
if 'Customer Number' in subl[0]:
DF_dict[subl[0]] = []
continue
last_key = list(DF_dict.keys())[-1]
DF_dict[last_key].append(subl[0])
for customer, notes in DF_dict.items():
for a_note in notes:
print(customer,a_note)
结果:
Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with
Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL
Customer Number: 103 Notes: bought a ton of stuff got a free keychain
Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser
Customer Number: 103 Notes: name address birthday social security number on file
Customer Number: 007 Notes: looked a lot like James Bond
Customer Number: 007 Notes: came in with a martini
将值放入这样的字典中,如果您想计算给定客户的注释数、计数注释或仅选择给定客户的注释,则可能很有用。
替代方案,无需在每次迭代中调用list(DF_dict.keys())[-1]
:
last_key = ''
for subl in DF:
if 'Customer Number' in subl[0]:
DF_dict[subl[0]] = []
last_key = subl[0]
continue
DF_dict[last_key].append(subl[0])
还有新的更短的版本,使用默认字典:
from collections import defaultdict
DF_dict = defaultdict(list)
for subl in DF:
if 'Customer Number' in subl[0]:
customer = subl[0]
continue
DF_dict[customer].append(subl[0])
【讨论】:
你也可以k = subl[0] DF_dict.setdefault(k,[])
忘记继续【参考方案5】:
只要格式与您的示例相同,这应该可以工作。
final_list = []
for outer_list in DF:
for s in outer_list:
if s.startswith("Customer"):
cust = s
elif s.startswith("Notes"):
final_list.append(cust + s)
for f in final_list:
print f
【讨论】:
【参考方案6】:只要你能指望第一个元素是客户,你就可以这样做。
只需循环遍历每个项目。如果项目是客户,则将当前客户设置为该字符串。否则,它是一个注释,因此您将客户和注释附加到结果列表中。
customer = ""
results = []
for record in DF:
data = record[0]
if "Customer" in data:
customer = data
elif "Notes" in data:
result = customer + data
results.append(result)
print(results)
【讨论】:
以上是关于在循环中按索引遍历列表列表,以重新格式化字符串的主要内容,如果未能解决你的问题,请参考以下文章