数着没有。在给定拆分原始列表的条件下,来自两个列表的匹配项
Posted
技术标签:
【中文标题】数着没有。在给定拆分原始列表的条件下,来自两个列表的匹配项【英文标题】:Counting no. of matches from two lists given a condition to split the original list 【发布时间】:2017-08-12 15:36:25 【问题描述】:我有一个浮动列表,其中包含一些以浮动比例编码的隐藏“级别”信息,我可以这样拆分浮动的“级别”:
import math
import numpy as np
all_scores = [1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]
easy, med, hard = [], [], []
for i in all_scores:
if i > math.exp(50):
easy.append(i)
elif i > math.exp(10):
med.append(i)
else:
hard.append(i)
print ([easy, med, hard])
[出]:
[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]
我还有另一个与all_scores
列表相对应的列表:
input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]
我需要检查有多少简单、中等和困难与所有分数匹配,我可以这样做来获取扁平化all_scores
列表中是否存在匹配的布尔值:
matches = [i == j for i, j in zip(input_scores, all_scores)]
print ([i == j for i, j in zip(input_scores, all_scores)])
[出]:
[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]
有没有办法知道比赛中有多少简单/中等/困难以及每个级别的比赛总和?
我已经尝试过了,它确实有效:
matches = [int(i == j) for i, j in zip(input_scores, all_scores)]
print(sum(matches[:len(easy)]) , len(easy), sum(np.array(easy) * matches[:len(easy)]) )
print(sum(matches[len(easy):len(easy)+len(med)]), len(med), sum(np.array(med) * matches[len(easy):len(easy)+len(med)]) )
print (sum(matches[len(easy)+len(med):]) , len(hard), sum(np.array(hard) * matches[len(easy)+len(med):]) )
[出]:
4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0
但是必须有一种不那么冗长的方法来实现相同的输出。
【问题讨论】:
【参考方案1】:在我看来,这是一份工作……Counter!
如果您还没有遇到它,Counter
就像 dict,但不是用新值替换像 .update()
这样的方法中的旧值,而是将它们添加到它们上面。所以:
from collections import Counter
counter = Counter('a': 2)
counter.update('a': 3)
counter['a']
> 5
因此,您可以使用以下代码获得上述结果:
from collections import Counter
matches, counts, scores = [
Counter('easy': 0, 'med': 0, 'hard': 0) for _ in range(3)
]
for score, inp in zip(all_scores, input_scores):
category = (
'easy' if score > math.exp(50) else
'med' if score > math.exp(10) else
'hard'
)
matches.update(category: score == inp)
counts.update(category: 1)
scores.update(category: score if score == inp else 0)
for cat in ('easy', 'med', 'hard'):
print(matches[cat], counts[cat], scores[cat])
【讨论】:
【参考方案2】:这是一个 numpy 解决方案,使用 digitize
创建类别并使用 bincount
对匹配项进行计数和求和。作为免费奖励,这些统计数据也是为剩菜创建的。
categories = 'hard', 'med', 'easy'
# get group membership by splitting at e^10 and e^50
# the 'right' keyword tells digitize to include right boundaries
cat_map = np.digitize(all_scores, np.exp((10, 50)), right=True)
# cat_map has a zero in all the 'hard' places of all_scores
# a one in the 'med' places and a two in the 'easy' places
# add a fourth group to mark all non-matches
# we have to force at least one np.array for element-by-element
# comparison to work
cat_map[np.asanyarray(all_scores) != input_scores] = 3
# count
numbers = np.bincount(cat_map)
# count again, this time using all_scores as weights
sums = np.bincount(cat_map, all_scores)
# print
for c, n, s in zip(categories + ('unmatched',), numbers, sums):
print(':12 :2d :6.4g'.format(c, n, s))
# output:
#
# hard 6 3.733e+04
# med 6 1.437e+08
# easy 4 3.52e+24
# unmatched 14 5.159e+24
【讨论】:
酷,我没听说过np.digitize
!!
顺便说一句,什么是“无与伦比”?为什么会有无与伦比的?
@alvas 我只是指那些input_scores
和all_scores
不匹配的。他们必须被转移到一个额外的组中,这样他们就不会被计入其他三个组中的任何一个。
啊,有道理。谢谢你的解释!【参考方案3】:
您可以使用dict
:
k = ('easy', 'meduim', 'hard')
param = dict.fromkeys(k,0) ; outlist = []
for index,i in enumerate(range(0, len(matches), 10)):
count = k[index]:sum(matches[i:i + 10])
outlist.append(count)
print(outlist)
['easy': 4, 'meduim': 6, 'hard': 6]
【讨论】:
【参考方案4】:您可以使用一系列字典作为查找表:
scores = defaultdict(list) # Keeps track of which numbers belong to categories
values = defaultdict(int) # Keeps count of the number seen
for i in all_scores:
if i > math.exp(50):
values["easy"] += 1
scores[i] = "easy"
elif i > math.exp(10):
values["medium"] += 1
scores[i] = "medium"
else:
values["hard"] += 1
scores[i] = "hard"
input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]
# Find the catagories of your inputs
r = [(scores[i], i) for i in input_scores if i in scores]
# Join your catagories to get the counts
res = defaultdict(list)
for k, v in r:
res[k].append(v)
for k, v in res.items():
print k, len(v), values[k], sum(v)
>>> medium 6 10 143744715.777
hard 6 10 37326.0
easy 4 10 3.52041505391e+24
【讨论】:
【参考方案5】:我不确定这种方法是否不那么冗长,但我会使用np.in1d
来匹配分数:
# we need numpy arrays
easy = np.array(easy)
med = np.array(med)
hard = np.array(hard)
for level in [easy, med, hard]:
matches = level[np.where(np.in1d(level, input_scores))]
print(len(matches), len(level), np.sum(matches))
此代码产生的输出与您的输出不同,但我认为您提供的数据已以某种方式损坏。例如,您的hard
数组中有两个7474.0
和4354.0
副本。这是预期的吗?在easy数组中还有两个6.7401171871631936e+22
。
在给定当前数据的情况下使用我的方法输出
5 10 3.58781622578e+24
6 10 143744715.777
8 10 53435.0
另外,我不完全确定你是如何求和的,所以我只是对所有匹配的分数求和(因此我们的值会不同)。
编辑: 改为使用匹配 input_scores
和 all_scores
。唯一改变的是我们将不得不使用 np.in1d
进行双重匹配:
scores = input_scores[np.where(np.in1d(input_scores, all_scores))]
for level in [easy, med, hard]:
matches = scores[np.where(np.in1d(scores, level))]
print(len(matches), len(level), np.sum(matches))
这消除了之前的重复问题。输出:
4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0
编辑 2:我意识到我使用 np.where
是多余的,可以完全删除它们。
scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
matches = scores[np.in1d(scores, level)]
print(len(matches), len(level), np.sum(matches))
产生与第一次编辑相同的输出。
编辑 3: 我将所有内容放在一个程序中。也可以使用 numpy 方便地进行简单/中等/困难分数的拆分。它可能会变得更高效,但这很容易阅读:
import math
import numpy as np
all_scores = np.array([1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0])
input_scores = np.array([0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0])
easy = all_scores[math.exp(50) < all_scores]
med = all_scores[(math.exp(10) < all_scores)*(all_scores < math.exp(50))] # * is boolean `and`
hard = all_scores[all_scores < math.exp(10)]
scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
matches = scores[np.in1d(scores, level)]
print(len(matches), len(level), np.sum(matches))
【讨论】:
all_scores 和 input_scores 的值是非唯一的,唯一绑定它们的是顺序以及它们的值是否匹配【参考方案6】:虽然您的问题已得到解答,但我仍然想尝试一下(为了练习)。该函数给出了预期的输出,但 Paul Panzer 的解决方案是迄今为止最优化的解决方案。 :)
def MatchesLenghtSums(L, lst):
"""
Compares a list, lst with a list of lists, L. If elements of lst are in L
Returns matching elements of lst, lenght of unpacked L, sum of lst
Precondition: len(L) = 3"""
# unpack L
easy, medium, hard = L
# traverse lst and find if there are matching elements between lst and
# unpacked lists
easyA = [e for e in lst if e in easy]
mediumB = [m for m in lst if m in medium]
hardC = [h for h in lst if h in hard]
return "(Easy Matches Lenght sum ) (Medium Matches Length sum ) (Hard Matches Lenght sum )".format(
len(easyA), len(easy), sum(easyA), len(mediumB),
len(medium), sum(mediumB), len(hardC), len(hard), sum(hardC))
L = [easy, med, hard]
lst = input_scores
MatchesLenghtSums(L, lst)
>>>'(Easy Matches 4 Lenght 10 sum 3.520415053910622e+24) (Medium Matches 6 Length 10 sum 143744715.77690864) (Hard Matches 6 Lenght 10 sum 37326.0)'
【讨论】:
以上是关于数着没有。在给定拆分原始列表的条件下,来自两个列表的匹配项的主要内容,如果未能解决你的问题,请参考以下文章