2048 游戏 - AI 的平均得分不能超过 256

Posted

技术标签:

【中文标题】2048 游戏 - AI 的平均得分不能超过 256【英文标题】:2048 game - AI can't score more that 256 average 【发布时间】:2017-12-04 10:31:10 【问题描述】:

我正在尝试使用 MiniMax 和 Alpha-Beta 修剪实现 2048 年的 AI,基于蛇策略(参见 this 论文),这似乎是最好的单一启发式方法。

不幸的是,大多数游戏中的 AI 都是 256,这并不比空单元启发式算法好多少。我已经在这里阅读了相关主题,但我自己找不到解决方案。

代码如下:

import math
from BaseAI_3 import BaseAI

INF_P = math.inf

class PlayerAI(BaseAI):
    move_str = 
        0: "UP",
        1: "DOWN",
        2: "LEFT",
        3: "RIGHT"
    

    def __init__(self):
        super().__init__()
        self.depth_max = 4

    def getMove(self, grid):
        move_direction, state, utility = self.decision(grid)
        act_move = moves.index(move_direction)
        return moves[act_move] if moves else None

    def get_children(self, grid):
        grid.children = []
        for move_direction in grid.getAvailableMoves():
            gridCopy = grid.clone()
            gridCopy.path = grid.path[:]
            gridCopy.path.append(PlayerAI.move_str[move_direction])
            gridCopy.move(move_direction)
            gridCopy.depth_current = grid.depth_current + 1
            grid.children.append((move_direction, gridCopy))
        return grid.children

    def utility(self, state):

        def snake():
            poses = [
                [
                    [2 ** 15, 2 ** 14, 2 ** 13, 2 ** 12],
                    [2 ** 8, 2 ** 9, 2 ** 10, 2 ** 11],
                    [2 ** 7, 2 ** 6, 2 ** 5, 2 ** 4],
                    [2 ** 0, 2 ** 1, 2 ** 2, 2 ** 3]
                ]
                ,
                [
                   [2 ** 15, 2 ** 8, 2 ** 7, 2 ** 0],
                   [2 ** 14, 2 ** 9, 2 ** 6, 2 ** 1],
                   [2 ** 13, 2 ** 10, 2 ** 5, 2 ** 2],
                   [2 ** 12, 2 ** 11, 2 ** 4, 2 ** 3]
                ]
            ]

            poses.append([item for item in reversed(poses[0])])
            poses.append([list(reversed(item)) for item in reversed(poses[0])])
            poses.append([list(reversed(item)) for item in poses[0]])

            poses.append([item for item in reversed(poses[1])])
            poses.append([list(reversed(item)) for item in reversed(poses[1])])
            poses.append([list(reversed(item)) for item in poses[1]])

            max_value = -INF_P
            for pos in poses:
                value = 0
                for i in range(state.size):
                    for j in range(state.size):
                        value += state.map[i][j] * pos[i][j]

                if value > max_value:
                    max_value = value

            return max_value

        weight_snake = 1 / (2 ** 13)

        value = (
            weight_snake * snake(),
        )

        return value

    def decision(self, state):
        state.depth_current = 1
        state.path = []
        return self.maximize(state, -INF_P, INF_P)

    def terminal_state(self, state):
        return state.depth_current >= self.depth_max

    def maximize(self, state, alpha, beta):
        # terminal-state check
        if self.terminal_state(state):
            return (None, state, self.utility(state))

        max_move_direction, max_child, max_utility = None, None, (-INF_P, )
        for move_direction, child in self.get_children(state):
            _, state2, utility = self.minimize(child, alpha, beta)
            child.utility = utility

            if sum(utility) > sum(max_utility):
                max_move_direction, max_child, max_utility = move_direction, child, utility

            if sum(max_utility) >= beta:
                break

            if sum(max_utility) > alpha:
                alpha = sum(max_utility)

        state.utility = max_utility
        state.alpha = alpha
        state.beta = beta

        return max_move_direction, max_child, max_utility

    def minimize(self, state, alpha, beta):
        # terminal-state check
        if self.terminal_state(state):
            return (None, state, self.utility(state))

        min_move_direction, min_child, min_utility = None, None, (INF_P, )
        for move_direction, child in self.get_children(state):
            _, state2, utility = self.maximize(child, alpha, beta)
            child.utility = utility

            if sum(utility) < sum(min_utility):
                min_move_direction, min_child, min_utility = move_direction, child, utility

            if sum(min_utility) <= alpha:
                break

            if sum(min_utility) < beta:
                beta = sum(min_utility)

        state.utility = min_utility
        state.alpha = alpha
        state.beta = beta

        return min_move_direction, min_child, min_utility

grid 是一个对象,grid.map 是一个二维数组(列表的列表)。

我有什么错误吗?如何改进代码?

添加游戏日志:https://pastebin.com/eyzgU2dN

【问题讨论】:

当我手动玩游戏时,我大部分时间都会向上滑动并离开,以保持块在左上角的最大值排序。我发现向下或向右滑动也会导致块更快地混乱,从而导致得分较低。只有当我无法向上或向左滑动时,我才会向右滑动。如果这 3 个不是一个选项,那么我会立即向下滑动,然后向上滑动。 ...这只是我的策略。 codereview 可能是一个更好的地方,如果此代码有效并且您只是想改进它 @depperm,我认为代码还可以,检查了几次 @GlennFerrie,我知道你的策略,蛇策略是个问题。可能是我执行错了吗? @depperm - 我完全理解你的想法,我猜在 Code Review 中可能不会;但这可能是合理地适合多个站点的问题之一(例如 2048 标签中提到的 gamedev)。它是一个堆栈溢出问题的情况是,也许代码 没有 工作,在某种意义上它没有按预期运行。可能算法没有正确实现,或者被误用了问题。 【参考方案1】:

在过去的周末,我意识到算法没有正确实施。 minimize() 函数中有一个错误,我以错误的方式搜索儿童 - 它应该是这样的:

def get_opponent_children(self, grid):
    grid.children = []
    for x in range(grid.size):
        for y in range(grid.size):
            if grid.map[x][y] == 0:
                for c in (2, 4):
                    gridCopy = grid.clone()
                    gridCopy.path = grid.path[:]
                    gridCopy.deep_current = grid.deep_current + 1
                    gridCopy.map[x][y] = c
                    grid.children.append((None, gridCopy))

    return grid.children

以及相应的变化:

for move_direction, child in self.get_opponent_children(state):

现在大部分时间都可以达到 1024 和 2048。

【讨论】:

以上是关于2048 游戏 - AI 的平均得分不能超过 256的主要内容,如果未能解决你的问题,请参考以下文章

Python syslog 记录器不能写入超过 2048 个字符

sql 平均上班时间

mysql 从两个单独的查询中创建选择结果

AI中文语言理解得分首超人类,阿里达摩院创造新纪录,大模型又立功了

Pygame小游戏练习五

游戏“外挂”?—— AI生成游戏最强攻略