在python中的colley排名算法实现中无效的结果
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在python中的colley排名算法实现中无效的结果相关的知识,希望对你有一定的参考价值。
我根据本文实现了体育比赛的colley排名算法:https://towardsdatascience.com/generate-sports-rankings-with-data-science-4dd1979571da
但是我得到了无效的结果。 r结果应该是获胜的概率,因此它必须在0到1之间。但是,使用大量输入时,我得到1.4的负结果。
这是Colley算法的已知问题吗?是否有可以正确处理大量数据的修复程序或替代算法?
我的代码:
import json
import numpy as np
c=None
b=None
def iter_game(t1, t2, r1, r2):
global c, b
# Updating vecotr b based on result of each game
if r1 > r2:
b[(t1 - 1)] += 1
b[(t2 - 1)] -= 1
elif r1 < r2:
b[(t1 - 1)] -= 1
b[(t2 - 1)] += 1
else: return
c[(t1 - 1)][(t1 - 1)] += + 1 # Updating diagonal element
c[(t2 - 1)][(t2 - 1)] += + 1 # Updating diagonal element
c[(t1 - 1)][(t2 - 1)] -= 1 # Updating off - diagonal element
c[(t2 - 1)][(t1 - 1)] -= 1 # Updating off - diagonal element
def main():
global c,b
num_players = 2537
# Initializing Colley Matrix 'c'and vector 'b'
c = np.zeros([num_players, num_players])
b = np.zeros(num_players)
with open("colley_games.json") as f:
games = json.load(f)
for game in games:
iter_game(game["home"], game["away"], game["score_home"], game["score_away"])
# Adding 2 to diagonal elements (total number of games) of Colley matrix
diag = c.diagonal() + 2
np.fill_diagonal(c, diag)
# Dividing by 2 and adding one to vector b
for i, value in enumerate(b):
b[i] = b[i] / 2
b[i] += 1
# Solving N variable linear equation
r = np.linalg.solve(c, b)
# Displaying ranking for top 10 teams
top_teams = r.argsort()[-10:][::-1]
for i in top_teams:
print (str(r[i]) + " " + str(i))
print ("----------------------------")
# Displaying ranking for lower 10 teams
top_teams = r.argsort()[:10][::-1]
for i in top_teams:
print (str(r[i]) + " " + str(i))
main()
使用colley_games.json输入执行时的结果:
# python3 rank_colley_ask.py
1.409508465374069 2135
1.1358580974322448 1759
1.134486271801534 2126
1.1314563266569193 1763
1.0930304236523831 2134
1.0809741214865278 1243
1.0633760655215825 2143
1.049467222041803 1748
1.0391031285438894 1470
1.0288821673935697 1453
----------------------------
-0.1304799893162797 1954
-0.15012844703440156 1929
-0.19462901224272772 2121
-0.20745023863077341 1930
-0.21188300405221577 890
-0.24910253479694192 968
-0.25265547797693333 2155
-0.34306068196974493 930
-0.3468485876254179 913
-0.3792348796324475 2151
[这里有小提琴:https://pyfiddle.io/fiddle/ffc0e2d7-3d47-4ee9-b9f8-9c28e6e3b500/?i=true(我不得不gzip,因为最大可升级为1MB)
答案
这是一个已知的问题,是由Colley在他的代数中使用一些手法引起的。
https://www.jellyjuke.com/the-problem-with-rpi-elo-and-the-colley-matrix.html
以上是关于在python中的colley排名算法实现中无效的结果的主要内容,如果未能解决你的问题,请参考以下文章
翻译: 网页排名PageRank算法的来龙去脉 以及 Python实现
翻译: 网页排名PageRank算法的来龙去脉 以及 Python实现