蟒蛇统计分析

Posted

技术标签:

【中文标题】蟒蛇统计分析【英文标题】:python statistical analysis 【发布时间】:2010-10-12 18:48:08 【问题描述】:

考虑到 15 名球员——2 名守门员、5 名后卫、5 名中场球员和 3 名前锋,并且每个球员都有价值和得分,我想用我的钱计算得分最高的球队。每支球队必须由 1 个 GK 组成,然后是一个阵型,例如4:4:2、4:3:3 等。我从这样的示例数据开始

玩家角色点数花费

然后我执行以下操作来评估所有组合

将每一行读入一个列表(针对每个角色),然后在嵌套运行中使用 itertools 来获取所有组合

if line[1] == "G": G.append(line[0])
if line[1] == "D": D.append(line[0])
if line[1] == "M": M.append(line[0])
if line[1] == "S": S.append(line[0])

for gk in itertools.combinations(G,1):
    for de in itertools.combinations(D,4):
        for mi in itertools.combinations(M,4):
            for st in itertools.combinations(S,2):
                teams[str(count)]= " ".join(gk)+" "+" ".join(de)+" "+" ".join(mi)+" "+" ".join(st)
                count +=1

获得团队后,我会计算他们的积分值和团队成本。如果它低于阈值,我打印它。 但如果我现在让这 20 名守门员、150 名后卫、150 名中场和 100 名前锋组成,我可以理解为失忆了。 我能做些什么来执行这个分析?它是我需要的生成器而不是递归函数吗?

非常感谢

【问题讨论】:

【参考方案1】:

你也许可以通过递归来解决这个问题。以下显示了基本轮廓,但忽略了诸如由一定数量的某些类型的球员组成的团队之类的细节。

players=['name':'A','score':5,'cost':10,
         'name':'B','score':10,'cost':3,
         'name':'C','score':6,'cost':8]

def player_cost(player):
    return player['cost']
def player_score(player):
    return player['score']
def total_score(players):
    return sum(player['score'] for player in players)

def finance_team_recurse(budget, available_players):
    affordable_players=[]
    for player in available_players:
        if player_cost(player)<=budget:
            # Since we've ordered available players, the first player appended
            # will be the one with the highest score.
            affordable_players.append(player)
    result=[]
    if affordable_players:
        candidate_player=affordable_players[0]
        other_players=affordable_players[1:]
        # if you include candidate_player on your team
        team_with_candidate=finance_team_recurse(budget-player_cost(candidate_player),
                                                 other_players)
        team_with_candidate.append(candidate_player)
        score_of_team_with_candidate=total_score(team_with_candidate)
        if score_of_team_with_candidate>total_score(other_players):
            result=team_with_candidate
        else:
            # if you exclude candidate_player from your team
            team_without_candidate=finance_team_recurse(budget, other_players)
            score_of_team_without_candidate=total_score(team_without_candidate)
            if score_of_team_with_candidate>score_of_team_without_candidate:
                result=team_with_candidate
            else:
                result=team_without_candidate
    return result

def finance_team(budget, available_players):
    tmp=available_players[:]
    # Sort so player with highest score is first. (Greedy algorithm?)
    tmp.sort(key=player_score, reverse=True)
    return finance_team_recurse(budget,tmp)

print(finance_team(20,players))
# ['score': 6, 'cost': 8, 'name': 'C', 'score': 10, 'cost': 3, 'name': 'B']

20 choose 1 = 20 combinations
150 choose 4 = 20260275 combinations
100 choose 2 = 4950 combinations

所以总共有20*20260275*20260275*4950 = 40637395564486875000L teams 字典中的项目。这会占用大量内存。

for gk in itertools.combinations(G,1):
    for de in itertools.combinations(D,4):
        for mi in itertools.combinations(M,4):
            for st in itertools.combinations(S,2):    
                #Don't collect the results into a dict.
                #That's what's killing you (memory-wise).
                #Just compute the cost and
                #Just print the result here.

PS。 40637395564486875000L 的顺序为10**19。假设您的程序每秒可以处理10**6 组合,则该程序大约需要 130 万年才能完成...

【讨论】:

我一千年,电脑会更快! +1:正确使用组合。这是可计算性 O 复杂性和不该做什么的教科书示例。杰出的。想要多次投票。 @user317225:在写这样的东西之前,你真的,真的,真的需要考虑整体的复杂性和基本的可计算性。 @florin: 一千年后你的伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大伟大great-great-great-great-great-great-great-great-great-great-great-great-great-great-granddaughter 将需要用新玩家更新脚本。当然,因为她将使用 Python102.6,这很容易。 :)【参考方案2】:

函数和生成器有很大帮助:

def make_teams(G, D, M, S):
    """ returns all possible teams """
    for gk in itertools.combinations(G,1):
        for de in itertools.combinations(D,4):
            for mi in itertools.combinations(M,4):
                for st in itertools.combinations(S,2):
                    yield gk, de, mi, st

def get_cost( team ):
    return sum( member.cost for member in team )

def good_teams( min_score=0):
    for team in make_teams(G, D, M, S):
        if get_cost( team ) > min_score:
            yield team

for team in good_teams(min_score=100):
    print team

它仍然会生成所有可能的组合,因此您现在可能会用完时间,而不是内存。

您正在做的似乎是 knapsack problem 的变体 - 您可以做得比尝试所有可能的组合更好,但不是更好

快速获得好的解决方案的一种方法是按玩家的每笔钱得分对他们进行排序。您应该首先获得得分最高的球队,但不能保证您获得最佳解决方案。***称其为“贪婪逼近算法”。

def score_per_cost( player ):
    return player.score / player.cost

def sorted_combinations(seq, n):
    return itertools.combinations(
        sorted(seq, key=score_per_cost, reverse=True),n)

def make_teams(G, D, M, S):
    """ returns all possible teams """
    for gk in sorted_combinations(G,1):
        for de in sorted_combinations(D,4):
            for mi in sorted_combinations(M,4):
                for st in sorted_combinations(S,2):
                    yield gk, de, mi, st

def get_cost( team ):
    return sum( member.cost for member in team )

def top_teams(n):
    return itertools.islice(make_teams(G, D, M, S),n)

for team in top_teams(100):
    print team

我将向读者添加“每个团队的成本make_teams 中的一行:p)。

【讨论】:

看到背包问题的时候我差点生病了!

以上是关于蟒蛇统计分析的主要内容,如果未能解决你的问题,请参考以下文章

markdown 蟒蛇统计列表中重复项出现的次数

python 蟒蛇统计文件夹中的文件数

随机森林回归 - 如何分析其性能? - 蟒蛇,sklearn

013 turtle程序语法元素分析

Python turtle绘图实例分析

大数据时代,Python是最好的语言!