计算多个子字符串一次出现在字符串中的次数
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了计算多个子字符串一次出现在字符串中的次数相关的知识,希望对你有一定的参考价值。
我在Python中创建一个简单的脚本,用于评估分数系统上密码的强度,该分数系统根据是否包含学校的大写或小写字母,数字和符号来给出和获取分数。
其中一个要求是它检查英国QWERTY键盘上从左到右连续的3个字母或数字,并为每个实例取出5个点。例如,密码'qwer123'将因'qwe','wer'和'123'而失去15分。怎么可以实现呢?我目前的代码如下。
def check():
user_password_score=0
password_capitals=False
password_lowers=False
password_numbers=False
password_symbols=False
password_explanation_check=False
ascii_codes=[]
password_explanation=[]
print("The only characters allowed in the passwords are upper and lower case letters, numbers and these symbols; !, $, %, ^, &, *, (, ), _, -, = and +.\n")
user_password=str(input("Enter the password you would like to get checked: "))
print("")
if len(user_password)>24 or len(user_password)<8:
print("That password is not between 8 and 24 characters and so the Password Checker can't evaluate it.")
menu()
for i in user_password:
ascii_code=ord(i)
#print(ascii_code)
ascii_codes.append(ascii_code)
#print(ascii_codes)
for i in range(len(ascii_codes)):
if ascii_codes[i]>64 and ascii_codes[i]<90:
password_capitals=True
elif ascii_codes[i]>96 and ascii_codes[i]<123:
password_lowers=True
elif ascii_codes[i]>47 and ascii_codes[i]<58:
password_numbers=True
elif ascii_codes[i] in (33,36,37,94,38,42,40,41,45,95,61,43):
password_symbols=True
else:
print("Your password contains characters that aren't allowed.\n")
menu()
if password_capitals==True:
user_password_score+=5
if password_lowers==True:
user_password_score+=5
if password_numbers==True:
user_password_score+=5
if password_symbols==True:
user_password_score+=5
if password_capitals==True and password_lowers==True and password_numbers==True and password_symbols==True:
user_password_score+=10
if password_numbers==False and password_symbols==False:
user_password_score-=5
if password_capitals==False and password_lowers==False and password_symbols==False:
user_password_score-=5
if password_capitals==False and password_lowers==False and password_numbers==False:
user_password_score-=5
#print(user_password_score)
if user_password_score>20:
print("Your password is strong.\n")
else:
print("That password is weak.\n")
#don't forget you still need to add the thing that checks for 'qwe' and other stuff.
menu()
您可以在一组字符串中存储禁用的序列,并在每次有人使用该序列时减少分数。
password = "qwert123"
score = 42 # initial score
sequences = { # all in lowercase because of the `lower()` in the loop
"qwertyuiopasdfghjklzxcvbnm",
"azertyuiopqsdfghjklmwxcvbn",
"abcdefghijklmnopqrstuvwxyz",
"01234567890"
}
match_length = 3 # length threshold for the sanction
sequences.update({s[::-1] for s in sequences}) # do we allow reverse ?
for c in range(len(password)-match_length+1):
for seq in sequences:
if password[c:c+match_length].lower() in seq:
score-=5
print(f"'{password[c:c+match_length]}' => -5 !")
break # Don't flag the same letters more than once
print(score) # 22 (42-4*5)
最简单的方法是通过所有可能的序列来暴力。
创建4个字符串:"1234567890"
,"qwertyuiop"
,"asdfghjkl"
,"zxcvbnm"
并循环遍历user_password
中的3个字符。
您可以在check
函数的开头初始化此列表:
sequences = ["1234567890", "qwertyuiop", "asdfghjkl", "zxcvbnm"]
然后在for i in range(len(ascii_codes))
循环内添加:
if(i<len(ascii_codes)-2): # since we will be checking for characters up to i+2 in our loop
flag = False # initialize a flag to signal finding a match
for s in sequences: # loop through each of the 4 keyboard sequences
if(s.find(user_password[i: i+3].lower()) != -1):
user_password_score -= 5
flag = True
break
if(flag): break
我会创建一个相邻键序列的列表,如上所述。然后我会创建一个sliding window function来生成长度为3的所有序列,并将它们与密码相匹配:
from itertools import islice
keyboard_rows = ['1234567890', 'qwertyuiop', 'asdfghjkl', 'zxcvbnm']
def window(seq, n=3):
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
for row in keyboard_rows:
for seq in window(row, n=3):
if "".join(seq) in password:
user_password_score -= 15
# scan other direction <--
for seq in window(row[::-1], n=3):
if "".join(seq) in password:
user_password_score -= 15
如果允许regular expressions,您可以在一行中执行此操作:
import re
user_password_score = 42
pwd = 'qwer123'
user_password_score += (lambda z : -5 * len([match.group(1) for match in re.compile('(?=({0}))'.format('|'.join(["({0})".format(w) for w in [x for y in [[s[i:i+3] for i in range(0,len(s)-2)] for s in ["qwertyuiopasdfghjklzxcvbnm", "azertyuiopqsdfghjklmwxcvbn", "abcdefghijklmnopqrstuvwxyz", "01234567890"]] for x in y]]))).finditer(z) ]))(pwd)
这段代码是等效的:
import re
user_password_score = 42
pwd = 'qwer123'
seqs = ["qwertyuiopasdfghjklzxcvbnm", "azertyuiopqsdfghjklmwxcvbn", "abcdefghijklmnopqrstuvwxyz", "01234567890"]
pattern = re.compile('(?=({0}))'.format('|'.join(["({0})".format(w) for w in [x for y in [[s[i:i+3] for i in range(0,len(s)-2)] for s in seqs] for x in y]])))
penalty = -5 * len([match.group(1) for match in pattern.finditer(pwd) ])
user_password_score += penalty
以下代码也是等效的(希望也是人类可读的)。我们将逐步打印它以更好地了解它正在做什么。
import re
def build_pattern(sequences):
all_triplets = []
triplets = []
for seq in sequences:
for i in range(0, len(seq) - 2):
triplets.append(seq[i:i+3])
all_triplets.append(triplets)
triplets = []
expanded_triplets = [ x for y in all_triplets for x in y ]
print("Plain list of triplets: " + str(expanded_triplets))
string_pattern = '|'.join( [ "({0})".format(x) for x in expanded_triplets ] )
lookahead_pattern = '(?=({0}))'.format(string_pattern)
print("Regex expression: " + lookahead_pattern)
return re.compile(lookahead_pattern)
password = 'qwer123'
user_password_score = 42
print("User password score: " + str(user_password_score))
sequences = ["qwertyuiopasdfghjklzxcvbnm",
"azertyuiopqsdfghjklmwxcvbn",
"abcdefghijklmnopqrstuvwxyz",
"01234567890"]
pattern = build_pattern(sequences)
matches = [ match.group(1) for match in pattern.finditer(password) ]
print("Matches : " + str(matches))
matches_count = len(matches)
penalty = -5 * matches_count
print("Penalty: " + str(penalty))
user_password_score += penalty
print("Final score: " + str(user_password_score))
这是输出:
User password score: 42
Plain list of triplets: ['qwe', 'wer', 'ert', 'rty', 'tyu', 'yui', 'uio', 'iop', 'opa', 'pas', 'asd', 'sdf', 'dfg', 'fgh', 'ghj', 'hjk', 'jkl', 'klz', 'lzx', 'zxc', 'xcv', 'cvb', 'vbn', 'bnm', 'aze', 'zer', 'ert', 'rty', 'tyu', 'yui', 'uio', 'iop', 'opq', 'pqs', 'qsd', 'sdf', 'dfg', 'fgh', 'ghj', 'hjk', 'jkl', 'klm', 'lmw', 'mwx', 'wxc', 'xcv', 'cvb', 'vbn', 'abc', 'bcd', 'cde', 'def', 'efg', 'fgh', 'ghi', 'hij', 'ijk', 'jkl', 'klm', 'lmn', 'mno', 'nop', 'opq', 'pqr', 'qrs', 'rst', 'stu', 'tuv', 'uvw', 'vwx', 'wxy', 'xyz', '012', '123', '234', '345', '456', '567', '678', '789', '890']
Regex expression: (?=((qwe)|(wer)|(ert)|(rty)|(tyu)|(yui)|(uio)|(iop)|(opa)|(pas)|(asd)|(sdf)|(dfg)|(fgh)|(ghj)|(hjk)|(jkl)|(klz)|(lzx)|(zxc)|(xcv)|(cvb)|(vbn)|(bnm)|(aze)|(zer)|(ert)|(rty)|(tyu)|(yui)|(uio)|(iop)|(opq)|(pqs)|(qsd)|(sdf)|(dfg)|(fgh)|(ghj)|(hjk)|(jkl)|(klm)|(lmw)|(mwx)|(wxc)|(xcv)|(cvb)|(vbn)|(abc)|(bcd)|(cde)|(def)|(efg)|(fgh)|(ghi)|(hij)|(ijk)|(jkl)|(klm)|(lmn)|(mno)|(nop)|(opq)|(pqr)|(qrs)|(rst)|(stu)|(tuv)|(uvw)|(vwx)|(wxy)|(xyz)|(012)|(123)|(234)|(345)|(456)|(567)|(678)|(789)|(890)))
Matches : ['qwe', 'wer', '123']
Penalty: -15
Final score: 27
在build_pattern
函数中,[ x for y in all_triplets for x in y ]
是一个将列表列表扩展为普通列表的技巧。在(lmw)|(mwx)|(wxc)
中使用的像finditer()
这样的正则表达式模式告诉我们要找到lmw,mwx和wxc的所有匹配项。当我们将这个模式包装在一个前瞻((?=())
)中时,我们告诉re
它还应该在结果上包含重叠匹配。
以上是关于计算多个子字符串一次出现在字符串中的次数的主要内容,如果未能解决你的问题,请参考以下文章
LeetCode 884. 两句话中的不常见单词 / 1342. 将数字变成 0 的操作次数(计算二进制长度统计1的个数) / 1763. 最长的美好子字符串(分治)