数着没有。在给定拆分原始列表的条件下，来自两个列表的匹配项

Posted 2023-02-24

技术标签:

【中文标题】数着没有。在给定拆分原始列表的条件下，来自两个列表的匹配项【英文标题】：Counting no. of matches from two lists given a condition to split the original list 【发布时间】：2017-08-12 15:36:25 【问题描述】：

我有一个浮动列表，其中包含一些以浮动比例编码的隐藏“级别”信息，我可以这样拆分浮动的“级别”：

import math
import numpy as np

all_scores = [1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]

easy, med, hard = [], [], []

for i in all_scores:
    if i > math.exp(50):
        easy.append(i)
    elif i > math.exp(10):
        med.append(i)
    else:
        hard.append(i)

print ([easy, med, hard])

[出]：

[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]

我还有另一个与all_scores 列表相对应的列表：

input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]

我需要检查有多少简单、中等和困难与所有分数匹配，我可以这样做来获取扁平化all_scores 列表中是否存在匹配的布尔值：

matches = [i == j for i, j in zip(input_scores, all_scores)]
print ([i == j for i, j in zip(input_scores, all_scores)])

[出]：

[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]

有没有办法知道比赛中有多少简单/中等/困难以及每个级别的比赛总和？

我已经尝试过了，它确实有效：

matches = [int(i == j) for i, j in zip(input_scores, all_scores)]

print(sum(matches[:len(easy)]) , len(easy), sum(np.array(easy) * matches[:len(easy)]) )
print(sum(matches[len(easy):len(easy)+len(med)]), len(med), sum(np.array(med) * matches[len(easy):len(easy)+len(med)]) )
print (sum(matches[len(easy)+len(med):]) , len(hard), sum(np.array(hard) * matches[len(easy)+len(med):]) )

[出]：

4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0

但是必须有一种不那么冗长的方法来实现相同的输出。

【问题讨论】：

【参考方案1】：

在我看来，这是一份工作……Counter!

如果您还没有遇到它，Counter 就像 dict，但不是用新值替换像 .update() 这样的方法中的旧值，而是将它们添加到它们上面。所以：

from collections import Counter

counter = Counter('a': 2)
counter.update('a': 3)
counter['a']
> 5

因此，您可以使用以下代码获得上述结果：

from collections import Counter

matches, counts, scores = [
    Counter('easy': 0, 'med': 0, 'hard': 0) for _ in range(3)
]

for score, inp in zip(all_scores, input_scores):
    category = (
        'easy' if score > math.exp(50) else
        'med' if score > math.exp(10) else
        'hard'
    )
    matches.update(category: score == inp)
    counts.update(category: 1)
    scores.update(category: score if score == inp else 0)

for cat in ('easy', 'med', 'hard'):
    print(matches[cat], counts[cat], scores[cat])

【讨论】：

【参考方案2】：

这是一个 numpy 解决方案，使用 digitize 创建类别并使用 bincount 对匹配项进行计数和求和。作为免费奖励，这些统计数据也是为剩菜创建的。

categories = 'hard', 'med', 'easy'

# get group membership by splitting at e^10 and e^50
# the 'right' keyword tells digitize to include right boundaries
cat_map = np.digitize(all_scores, np.exp((10, 50)), right=True)
# cat_map has a zero in all the 'hard' places of all_scores
# a one in the 'med' places and a two in the 'easy' places

# add a fourth group to mark all non-matches
# we have to force at least one np.array for element-by-element
# comparison to work
cat_map[np.asanyarray(all_scores) != input_scores] = 3

# count
numbers = np.bincount(cat_map)
# count again, this time using all_scores as weights
sums = np.bincount(cat_map, all_scores)

# print
for c, n, s in zip(categories + ('unmatched',), numbers, sums):
    print(':12  :2d  :6.4g'.format(c, n, s))

# output:
#
# hard           6  3.733e+04
# med            6  1.437e+08
# easy           4  3.52e+24
# unmatched     14  5.159e+24

【讨论】：

酷，我没听说过np.digitize！！顺便说一句，什么是“无与伦比”？为什么会有无与伦比的？ @alvas 我只是指那些input_scores 和all_scores 不匹配的。他们必须被转移到一个额外的组中，这样他们就不会被计入其他三个组中的任何一个。啊，有道理。谢谢你的解释！【参考方案3】：

您可以使用dict：

k = ('easy', 'meduim', 'hard')    
param = dict.fromkeys(k,0) ; outlist = []
for index,i in enumerate(range(0, len(matches), 10)):
    count = k[index]:sum(matches[i:i + 10])
    outlist.append(count)

print(outlist)
['easy': 4, 'meduim': 6, 'hard': 6]

【讨论】：

【参考方案4】：

您可以使用一系列字典作为查找表：

scores = defaultdict(list)  # Keeps track of which numbers belong to categories
values = defaultdict(int)  # Keeps count of the number seen
for i in all_scores:
    if i > math.exp(50):
        values["easy"] += 1
        scores[i] = "easy"
    elif i > math.exp(10):
        values["medium"] += 1
        scores[i] = "medium"
    else:
        values["hard"] += 1
        scores[i] = "hard"

input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]

# Find the catagories of your inputs
r = [(scores[i], i) for i in input_scores if i in scores]

# Join your catagories to get the counts
res = defaultdict(list)
for k, v in r:
    res[k].append(v)

for k, v in res.items():
    print k, len(v), values[k], sum(v)



>>> medium 6 10 143744715.777
hard 6 10 37326.0
easy 4 10 3.52041505391e+24

【讨论】：

【参考方案5】：

我不确定这种方法是否不那么冗长，但我会使用np.in1d 来匹配分数：

# we need numpy arrays
easy = np.array(easy)
med = np.array(med)
hard = np.array(hard)

for level in [easy, med, hard]:
    matches = level[np.where(np.in1d(level, input_scores))]
    print(len(matches), len(level), np.sum(matches))

此代码产生的输出与您的输出不同，但我认为您提供的数据已以某种方式损坏。例如，您的hard 数组中有两个7474.0 和4354.0 副本。这是预期的吗？在easy数组中还有两个6.7401171871631936e+22。

在给定当前数据的情况下使用我的方法输出

5 10 3.58781622578e+24
6 10 143744715.777
8 10 53435.0

另外，我不完全确定你是如何求和的，所以我只是对所有匹配的分数求和（因此我们的值会不同）。

编辑： 改为使用匹配 input_scores 和 all_scores。唯一改变的是我们将不得不使用 np.in1d 进行双重匹配：

scores = input_scores[np.where(np.in1d(input_scores, all_scores))]
for level in [easy, med, hard]:
    matches = scores[np.where(np.in1d(scores, level))]
    print(len(matches), len(level), np.sum(matches))

这消除了之前的重复问题。输出：

4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0

编辑 2：我意识到我使用 np.where 是多余的，可以完全删除它们。

scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
    matches = scores[np.in1d(scores, level)]
    print(len(matches), len(level), np.sum(matches))

产生与第一次编辑相同的输出。

编辑 3： 我将所有内容放在一个程序中。也可以使用 numpy 方便地进行简单/中等/困难分数的拆分。它可能会变得更高效，但这很容易阅读：

import math
import numpy as np

all_scores = np.array([1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0])

input_scores = np.array([0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0])

easy = all_scores[math.exp(50) < all_scores]
med = all_scores[(math.exp(10) < all_scores)*(all_scores < math.exp(50))] # * is boolean `and`
hard = all_scores[all_scores < math.exp(10)]

scores = input_scores[np.in1d(input_scores, all_scores)]
for level in [easy, med, hard]:
    matches = scores[np.in1d(scores, level)]
    print(len(matches), len(level), np.sum(matches))

【讨论】：

all_scores 和 input_scores 的值是非唯一的，唯一绑定它们的是顺序以及它们的值是否匹配【参考方案6】：

虽然您的问题已得到解答，但我仍然想尝试一下（为了练习）。该函数给出了预期的输出，但 Paul Panzer 的解决方案是迄今为止最优化的解决方案。 :)

    def MatchesLenghtSums(L, lst):
        """
        Compares a list, lst with a list of lists, L. If elements of lst are in L 
        Returns matching elements of lst, lenght of unpacked L, sum of lst  
        Precondition: len(L) = 3"""

        # unpack L
        easy, medium, hard = L
        # traverse lst and find if there are matching elements between lst and 
        # unpacked lists
        easyA = [e for e in lst if e in easy]
        mediumB = [m for m in lst if m in medium]
        hardC = [h for h in lst if h in hard]

        return "(Easy Matches  Lenght  sum ) (Medium Matches  Length  sum ) (Hard Matches  Lenght  sum )".format(
                len(easyA), len(easy), sum(easyA), len(mediumB), 
        len(medium), sum(mediumB), len(hardC), len(hard), sum(hardC))

L = [easy, med, hard]
lst = input_scores
MatchesLenghtSums(L, lst)

>>>'(Easy Matches 4 Lenght 10 sum 3.520415053910622e+24) (Medium Matches 6 Length 10 sum 143744715.77690864) (Hard Matches 6 Lenght 10 sum 37326.0)'

【讨论】：

以上是关于数着没有。在给定拆分原始列表的条件下，来自两个列表的匹配项的主要内容，如果未能解决你的问题，请参考以下文章