使用带有 Alpha-Beta 修剪的 MinMax 找到最佳移动

Posted 2023-04-13

技术标签:

【中文标题】使用带有 Alpha-Beta 修剪的 MinMax 找到最佳移动【英文标题】：Finding the best move using MinMax with Alpha-Beta pruning 【发布时间】：2015-02-16 02:09:27 【问题描述】：

我正在为游戏开发 AI，我想使用 MinMax 算法和 Alpha-Beta 修剪。

我对它的工作原理有一个粗略的了解，但我仍然无法从头开始编写代码，所以我最近两天一直在网上寻找某种伪代码。

我的问题是，我在网上找到的每个伪代码似乎都是基于找到最佳移动的值，而我需要返回最佳移动本身而不是数字。

我当前的代码是基于这个伪代码 (source)

minimax(level, player, alpha, beta)  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn)
       // Find max and store in alpha
       for each child 
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) alpha = score
          if (alpha >= beta) break;  // beta cut-off
       
       return alpha
     else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child 
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       
       return beta
    


// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

如您所见，此代码返回一个数字，我想这是使一切正常工作所必需的（因为返回的数字在递归期间使用）。

所以我想我可能会使用一个外部变量来存储最佳移动，这就是我更改之前代码的方式：

minimax(level, player, alpha, beta)  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn)
       // Find max and store in alpha
       for each child 
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) 
              alpha = score
              bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE
          
          if (alpha >= beta) break;  // beta cut-off
       
       return alpha
     else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child 
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       
       return beta
    


// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

现在，这对我来说是有意义的，因为只有在轮到玩家时，我们才需要更新最佳移动，并且移动比之前的移动更好。

所以，虽然我认为这个是正确的（即使我不是 100% 确定），但 source 也有一个 java 实现，即使在 @ 中也会更新 bestMove 987654326@ 案例，我不明白为什么。

尝试使用该实现导致我的代码选择对立玩家的最佳移动，这似乎不正确（假设我是黑人玩家，我正在寻找我的最佳移动可以，所以我期待的是“黑色”移动，而不是“白色”移动）。

我不知道我的伪代码（第二个）是否是使用 MinMax 和 alpha-beta pruning 找到最佳移动的正确方法，或者我是否需要即使在 score 的情况下也能更新最佳移动。

如果您愿意，请随时提出任何新的和更好的伪代码，我不受任何约束，如果它比我的更好，我不介意重写一些代码。

编辑：

由于我看不懂回复，我猜可能这个问题没有问我想知道什么，所以我在这里尝试写得更好。

假设我只想为一个玩家获得最佳移动，并且这个玩家是最大化器，每次我需要时都会传递给 MinMax 函数一个新的移动（这样minmax(2, black, a, b) 为黑人玩家返回最佳移动，而minmax(2, white, a ,b) 为白人玩家返回最佳移动），您将如何更改第一个伪代码（或 java 实现在源中）将这个给定的最佳移动存储在某处？

编辑2：

让我们看看我们是否可以让它工作。

这是我的实现，请问是否正确？

//PlayerType is an enum with just White and Black values, opponent() returns the opposite player type
protected int minMax(int alpha, int beta, int maxDepth, PlayerType player)         
    if (!canContinue()) 
        return 0;
    
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    int value = 0;
    boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI        
    if (maxDepth == 0 || board.isGameOver()) 
        value = evaluateBoard();
        return value;
    
    while (movesIterator.hasNext()) 
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        value = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) 
            if (value > alpha) 
                selectedMove = currentMove;
                alpha = value;
            
         else 
            if (value < beta) 
                beta = value;
            
        
        if (alpha >= beta) 
            break;
        
    
    return (isMaximizer) ? alpha : beta;

编辑 3：

基于@Codor's answer/cmets 的新实现

private class MoveValue 
    public Move move;
    public int value;

    public MoveValue() 
        move = null;
        value = 0;
    

    public MoveValue(Move move, int value) 
        this.move = move;
        this.value = value;
    

    @Override
    public String toString() 
        return "MoveValue" + "move=" + move + ", value=" + value + '';
    



protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) 
    if (!canContinue()) 
        return new MoveValue();
    
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    MoveValue moveValue = new MoveValue();
    boolean isMaximizer = (player.equals(playerType));
    if (maxDepth == 0 || board.isGameOver())             
        moveValue.value = evaluateBoard();
        return moveValue;
    
    while (movesIterator.hasNext()) 
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) 
            if (moveValue.value > alpha) 
                selectedMove = currentMove;
                alpha = moveValue.value;
            
         else 
            if (moveValue.value < beta) 
                beta = moveValue.value;
                selectedMove = currentMove;
            
        
        if (alpha >= beta) 
            break;
        
    
    return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta);

我不知道我是做对了还是做错了什么，但我又回到了我发布问题时遇到的问题：

调用minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black) 返回一个只能由白棋玩家完成的动作，这不是我需要的。

我需要给定玩家的最佳移动，而不是整个棋盘的最佳移动。

【问题讨论】：

【参考方案1】：

经过一些研究并浪费了大量时间来解决这个问题，我想出了这个似乎可行的解决方案。

private class MoveValue 

    public double returnValue;
    public Move returnMove;

    public MoveValue() 
        returnValue = 0;
    

    public MoveValue(double returnValue) 
        this.returnValue = returnValue;
    

    public MoveValue(double returnValue, Move returnMove) 
        this.returnValue = returnValue;
        this.returnMove = returnMove;
    




protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player)        
    if (!canContinue()) 
        return new MoveValue();
            
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    double value = 0;
    boolean isMaximizer = (player.equals(playerType)); 
    if (maxDepth == 0 || board.isGameOver())             
        value = evaluateBoard();            
        return new MoveValue(value);
    
    MoveValue returnMove;
    MoveValue bestMove = null;
    if (isMaximizer)            
        while (movesIterator.hasNext()) 
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) 
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            
            if (returnMove.returnValue > alpha) 
                alpha = returnMove.returnValue;
                bestMove = returnMove;
            
            if (beta <= alpha) 
                bestMove.returnValue = beta;
                bestMove.returnMove = null;
                return bestMove; // pruning
            
        
        return bestMove;
     else 
        while (movesIterator.hasNext()) 
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) 
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            
            if (returnMove.returnValue < beta) 
                beta = returnMove.returnValue;
                bestMove = returnMove;
            
            if (beta <= alpha) 
                bestMove.returnValue = alpha;
                bestMove.returnMove = null;
                return bestMove; // pruning
            
        
        return bestMove;

【讨论】：

【参考方案2】：

这有点困难，因为给定的代码不是实际的 Java 实现；为了实现你想要的，必须有具体的类型来表示游戏树中的移动和位置。通常游戏树没有显式编码，而是以稀疏表示进行导航，其中实现将实际执行有问题的移动，递归评估生成的较小问题并撤消移动，因此通过调用堆栈使用depth-first search，因此表示当前路径。

要获得实际的最佳移动，只需从您的方法中返回实例即可最大化后续评估。首先实现Minimax algorithm 而不使用alpha-beta-pruning 可能会有所帮助，这是在基本结构工作后的后续步骤中添加的。

问题中的链接（第 1.5 节）中的实现实际上返回了最佳移动，如从那里获取的以下评论所示。

/** Recursive minimax at level of depth for either
    maximizing or minimizing player.
    Return int[3] of score, row, col  */

这里没有使用用户定义的类型来表示移动，但是该方法返回三个值，它们是评估的最佳得分和玩家实际执行最佳移动的坐标（实现已经有完成以获得分数），这是实际移动的表示。

【讨论】：

您将如何更改***的伪代码以存储最佳移动而不仅仅是最佳价值？顺便说一句，在问题的源链接中有一个我正在使用的 java 实现（我刚刚更改了最佳移动的类型），您可以将其用作参考你引用的实现（1.5节）实际上返回了最好的移动；查看更新的答案。好的。现在，考虑到尽管有当前玩家，此实现仍返回最佳移动（这就是为什么 ir 在score > alpha 和score < beta 情况下更新最佳移动），以获得最大化器的最佳移动，在这两种情况下我必须更新最好的举动？在这两种情况下，因为这些情况对应于两个不同的玩家交替评估。更通俗地说，算法被调用来评估玩家 A；然后它会问“我可以做什么动作？” - 对于这些动作中的每一个，假设执行它，称自己为“玩家 B 看看他或她会做什么”（这导致棋盘的空闲位置更少），存储移动的值以检查它并撤消评估下一步行动。一旦发现“足够好”的动作，alpha 和 beta 值基本上只是消除了对所有可能动作的检查。对不起，我还是不明白，我更新了问题，希望现在我需要知道的更清楚了

以上是关于使用带有 Alpha-Beta 修剪的 MinMax 找到最佳移动的主要内容，如果未能解决你的问题，请参考以下文章