Python - 检查字母是不是出现在连续的单词中

Posted

技术标签:

【中文标题】Python - 检查字母是不是出现在连续的单词中【英文标题】:Python - Check if letter appears in consecutive wordsPython - 检查字母是否出现在连续的单词中 【发布时间】:2015-02-04 09:37:09 【问题描述】:

我有一个任务,我需要读取输入并检查该输入是否出现在某些单词中。例如:

Who are your friends? Fred Bill Sue Simone
What is the message? Should you make tea?
Sue could have written this.

它打印“Sue could have write this 因为字母“S”、“U”和“E”出现在每个连续的单词中。另一个例子是:

Who are your friends? James Nicky Jake
What is the message? join and make enough cash today!
James could have written this.
Jake could have written this.

两个名字都被打印出来,因为它们的两个字母在每个单词中都连续出现。我有以下代码:

friends = input("Who are your friends? ").split()
message = input("What is the message? ").split()

name = []
other = []

for friend in friends:
  for f in friend.lower():
    for word in message:
      print("checking if", f, "is in", word.lower())
      if f in word.lower():
        print("Adding", f, " to name list")
        name.append(f)
        break
      else:
        other.append(f)
        continue

joinedResult = ''.join(name)

for person in friends:
  if person.lower() in joinedResult:
    print(person, "could have written this.")

它对第一个例子很有效,但对于第二个例子,它会打印所有三个名字:

James could have written this.
Nicky could have written this.
Jake could have written this.

我了解到代码不会检查名称中的字母是否连续出现,而是检查名称是否包含在任何单词中。我该如何解决这个问题?

【问题讨论】:

如果join之前还有一个单词,第二个例子的输出是什么,比如:'zzz'? 【参考方案1】:

您可以使用zipall() 来做到这一点:

friends = input("Who are your friends? ").split()
message = input("What is the message? ").lower().split()

for friend in friends:
  if len(friend) <= len(message):
    if all(x in y for x, y in zip(friend.lower(), message)):
        print(friend, "could have written this.")

演示:

>>> 
Who are your friends? Fred Bill Sue Simone
What is the message? Should you make tea?
Sue could have written this.
>>> 
Who are your friends? James Nicky Jake
What is the message? join and make enough cash today!
James could have written this.
Jake could have written this.

【讨论】:

如果朋友=“Anastasia”和消息=“在另一个故事中,蚂蚁激活了超级离子!”。输出是:Anastasia 可以写这个。该程序不应该输出任何内容,因为“Anastasia”并未出现在所有单词中。 @user3036519 我明白你的意思了,在那种情况下zip() 不够,去itertools.zip_longest 朋友 = 鲍勃和消息 = 鲍勃建造了气球镇。你的代码什么也没输出,它是用来输出的:bob 本来可以写这个 @user3036519 现在怎么样? 非常好,但在某些情况下它不起作用:说消息中的第一个单词与任何名称中的任何第一个字母都不匹配,例如:'dummyword join and make enough cash today! '在你的第二个例子中【参考方案2】:

使用正则表达式可能会更容易一些:

friends = raw_input("Who are your friends? ").split()
message = raw_input("What is the message? ").lower()

name = []
other = []

for friend in friends:
    regStr = '\w*\s?' + ''.join(['\w*' + f + '\w*\s' for f in friend.lower()])
    if re.match(regStr, message):
        name.append(friend)

for friend in name:
    print friend + " could have written this."

正则表达式模式喜欢:\w*\s?(s)\w*\s\w*(u)\w*\s\w*(e)\w* for 朋友 Sue

测试用例:

Shoulde i? [不匹配]​​(sue 找到但不连续 => [S]ho[u]ld[e] i?

Should I make tea?[不匹配]​​

Should u make tea?[苏]

【讨论】:

【参考方案3】:
def find_char(string, char):
    start_index = 0
    while True:
        yield string.lower().find(char, start_index)  # checks for the char in the string
        start_index += 1  # increments the index to find further in the word, 
        # eg: 
        # Bob constructed[index 0]
        # ob constructed[index 1]
        # b constructed[index 2]


def find_it(friends, message):
    friends = friends.split()
    for friend in friends:
        sequence_check = []
        for char in friend.lower():
            gen = find_char(message, char)  # creates the find_char generator
            for _ in message:  # limits the search to the length of the word
                char_index = next(gen) # try to find the index
                if char_index not in sequence_check: # if not in the sequence
                    sequence_check.append(char_index) # add it to it
                    break
        if -1 in sequence_check: # this check if every character of the name is in the word
            continue
        if sorted(sequence_check) == sequence_check: # this part check if it's in a sequence.
            print (friend + ' could have written ' + message)


find_it('James Nicky Jake', "join and make enough cash today!")
find_it('Fred Bill Sue Simone', "Should you make tea?")
find_it("Bob", "Bob constructed Balloon Town")

输出:

James could have written join and make enough cash today!
Jake could have written join and make enough cash today!
Sue could have written Should you make tea?
Bob could have written Bob constructed Balloon Town

完全重做,现在干净多了。

大部分工作是在 find_char 函数中完成的,它是一个生成器,它在每次迭代中减少了它的搜索空间,所以它不会找到位置 'Bob' 作为 [0,1,0],而是 [0,1 ,2] 在序列中。

有任何问题,欢迎提问。

【讨论】:

如果朋友 ​​=“Fred Bill Sue Simone”并且消息 =“你应该泡茶吗?”输出是:Bill 可以写这个 Sue 可以写这个。期望的输出:苏本可以写这个 忘记检查姓名的字母是否在输入中。等一下,会解决的。 if sorted(sequence_check) == sequence_check "".join(letters) == friend.lower(): 出现语法错误。靠近 "".join(字母) 缺少一个“和”。 ;) 在 python 3.x.x 中使用 next(gen) 而不是 gen.next()【参考方案4】:

请注意,据我了解,您的意思是他们名字中的第 n 个字母必须出现在消息中的第 n 个单词中。也许我错了,你可以澄清一下。

您需要将他们名字中的每个字母与消息中的单词配对,然后检查是否包含。你可以使用zip来做到这一点

friends = 'James Nicky Jake'.split()
message = 'join and make enough cash today!'.split()

names = []
others = []

for friend in friends:
    match = True
    length = len(friend)

    for letter, word in zip(friend.lower(), message):
        if not letter in word.lower():
            match = False
            break

    if match:
        names.append(friend)
    else:
        others.append(friend)

for person in names:
        print(person, "could have written this.")

【讨论】:

如果朋友=“Anastasia”和消息=“在另一个故事中,蚂蚁激活了超级离子!”。输出是:Anastasia 可以写这个。该程序不应该输出任何内容,因为“Anastasia”并未出现在所有单词中。【参考方案5】:
friends=["James","Nicky","Jake"]
words=["James could have written this","Jake could have written this"]
for friend in friends:
    for word in words:
        for name in word.split():
            if friend.lower()==name.lower():
                print friend,"yes"
            else:
                print friend,"no"

您可以使用这个简单的代码,而不是比较 letterletter,这也容易出错,因为字母可以在字符串中的任何位置,不一定是连续的。

【讨论】:

输入的意思是:今天加入并赚到足够的现金!。如果我用输入替换你在单词中存储的内容,它就不起作用。 @user3036519 将您的输入转换为我使用的格式。然后使用代码。 我认为您误解了 OP 正在寻找的内容,据我了解,如果 X 姓名中的第 n 个字母出现在消息中的第 n 个单词中(对于所有 n),则 X 可能是作者。 【参考方案6】:

代码

def whosdoneit(names,message):
    good_names = []
    l_m = len(message)
    for name in names:
        if len(name) > l_m: continue
        if all(c.lower() in word.lower() for c, word in zip(name, message)):
            good_names.append(name)
    return good_names

print whosdoneit('Fred Bill Sue Simone'.split(),
                 'Should you make tea?'.split())

print whosdoneit('James Nicky Jake'.split(),
                 'join and make enough cash today!'.split())

输出

['Sue']
['James', 'Jake']

评论

该函数返回一个好名字的列表,这些人可以写出包含他们名字的para-acrostic,所以

我们开始将列表初始化为空列表

接下来我们观察到,没有超过消息中单词数的名字可以满足要求,所以,以后要使用它,

我们计算并保存消息的长度(以单词为单位)

现在,

我们遍历列表names 以验证name 是否符合规则

名称过长,不做进一步处理

使用zip,我们构造了一个对列表,name中的字符cmessage中的word,我们使用列表推导构造了一个布尔列表

如果 all 布尔值为真(allany 是有用的内置函数!)然后将 name 附加到 good_names 列表中

将好名字列表返回给调用者。

我还包含了几个模仿 OP 示例的函数调用。

可能期待已久的单线

s 是为嫌疑人...

def s(n,m):return [g for l in [len(m)] for g in n if len(g)<=l and all([c.lower() in w.lower() for c,w in zip(g,m)])]

【讨论】:

以上是关于Python - 检查字母是不是出现在连续的单词中的主要内容,如果未能解决你的问题,请参考以下文章

替换所有连续重复的字母,忽略特定的单词

输入一行字符(可能包含英文字母,数字字符等其他字符),要求统计其中单词的个数?

相互检查字符串(Anagrams)

检查用户名是不是有来自名称的 4 个连续字母 - javascript

python 给出一个单词,检查文件中哪些文本行包含该单词中的所有字母。采用两个参数:文件名和te行

检查一个大写字母是不是在字符串中