实现一个简单的解释器

Posted 2020-11-28 Xlgd notes

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了实现一个简单的解释器相关的知识，希望对你有一定的参考价值。

译自：https://ruslanspivak.com/lsbasi-part2/
（已获作者授权）

在他们的著作《有效思维的五个要素》(The 5 Elements of Effective Thinking)中，Burger和Starbird分享了一个故事，讲述了他们如何观察国际知名的小号演奏家托尼·普洛(Tony Plog)为有成就的小号演奏家举办大师班。学生们首先演奏复杂的乐句，他们演奏得很好，但是随后他们被要求演奏非常基本、简单的音符时，与以前演奏的复杂乐句相比，这些音符听起来更幼稚(childish)。他们演奏完毕后，大师老师也演奏了相同的音符，但是当他演奏它们时，它们听起来并不幼稚，区别是惊人的。托尼解释说，掌握简单音符的演奏可以使人在更复杂的控制下演奏复杂的乐曲。该课程很明确：要建立真正的技巧，必须将重点放在掌握简单的基本思想上。

故事中的课程显然不仅适用于音乐，还适用于软件开发。这个故事很好地提醒了我们所有人，即使有时感觉就像是退后一步，也不要忘记深入研究简单，基本概念的重要性。精通所使用的工具或框架很重要，但了解其背后的原理也非常重要。正如Ralph Waldo Emerson所说：

“如果你只学习方法，那么你将被束缚在方法上。但是，如果你学习了原理，就可以设计自己的方法。”

关于这一点，让我们再次深入了解解释器和编译器。

今天，我将向您展示第1部分中的计算器的新版本，该版本将能够：

1、在输入字符串中的处理任何地方的空格
2、处理输入中的多位数整数
3、减去两个整数（当前只能加整数）
这是可以执行上述所有操作的新版本计算器的源代码：

# Token types
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = \'INTEGER\', \'PLUS\', \'MINUS\', \'EOF\'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, \'+\', \'-\', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS \'+\')
        """
        return \'Token({type}, {value})\'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception(\'Error parsing input\')

    def advance(self):
        """Advance the \'pos\' pointer and set the \'current_char\' variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = \'\'
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == \'+\':
                self.advance()
                return Token(PLUS, \'+\')

            if self.current_char == \'-\':
                self.advance()
                return Token(MINUS, \'-\')

            self.error()

        return Token(EOF, None)

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def expr(self):
        """Parser / Interpreter

        expr -> INTEGER PLUS INTEGER
        expr -> INTEGER MINUS INTEGER
        """
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        # we expect the current token to be an integer
        left = self.current_token
        self.eat(INTEGER)

        # we expect the current token to be either a \'+\' or \'-\'
        op = self.current_token
        if op.type == PLUS:
            self.eat(PLUS)
        else:
            self.eat(MINUS)

        # we expect the current token to be an integer
        right = self.current_token
        self.eat(INTEGER)
        # after the above call the self.current_token is set to
        # EOF token

        # at this point either the INTEGER PLUS INTEGER or
        # the INTEGER MINUS INTEGER sequence of tokens
        # has been successfully found and the method can just
        # return the result of adding or subtracting two integers,
        # thus effectively interpreting client input
        if op.type == PLUS:
            result = left.value + right.value
        else:
            result = left.value - right.value
        return result


def main():
    while True:
        try:
            # To run under Python3 replace \'raw_input\' call
            # with \'input\'
            text = raw_input(\'calc> \')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == \'__main__\':
    main()

将以上代码保存到calc2.py文件中，或直接从GitHub下载。试试看,了解一下它可以做什么：
它可以处理输入中任何地方的空格；它可以接受多位数整数，也可以减去两个整数，也可以加上两个整数。

这是我在笔记本电脑上的运行效果：

$ python calc2.py
calc> 27 + 3
30
calc> 27 - 7
20
calc>

与第1部分中的版本相比，主要的代码更改是：

1、get_next_token函数被重构了一部分，递增pos指针的逻辑单独放入函数advance中。
2、添加了两个函数：skip_whitespace忽略空白字符，integer处理输入中的多位数整数。
3、修改了expr函数，以识别INTEGER-> MINUS-> INTEGER短语，以及INTEGER-> PLUS-> INTEGER短语。现在，函数可以在成功识别(recognize)相应短语之后来解释加法和减法运算。

在第1部分中，你学习了两个重要的概念，即Token和词法分析器(lexical analyzer)的概念。今天，我想谈谈词素(lexemes)，解析(parsing)和解析器(parser)。

你已经了解Token，但是，为了使我更完整地讨论Token，我需要提及词素。什么是词素？词素是形成Token的一系列字符，在下图中，你可以看到Token和词素的一些示例，希望可以使它们之间的关系更清晰一点：

现在，还记得expr函数吗？我之前说过，这实际上是对算术表达式进行解释的地方。但是，在解释一个表达式之前，首先需要识别它是哪种短语(phrase)，例如，是加还是减，这就是expr函数的本质：它从get_next_token方法获取的Token流中查找结构(structure)，然后解释已识别的短语，从而生成算术表达式的结果。

在Token流中查找结构的过程，或者换句话说，在Token流中识别短语的过程称为解析(parsing)。执行该工作的解释器或编译器部分称为解析器(parser)。

因此，现在您知道expr函数是解释器的一部分，解析和解释都会发生在expr函数中，首先尝试在Token流中识别（解析）INTEGER-> PLUS-> INTEGER或INTEGER-> MINUS-> INTEGER短语，并在成功识别（解析）其中一个短语之后，该方法对其进行解释，将两个整数相加或相减的结果返回给调用函数。

现在该做练习了：

1、扩展计算器以处理两个整数的乘法
2、扩展计算器以处理两个整数的除法
3、修改代码以解释包含任意数量的加法和减法的表达式，例如" 9-5 + 3 + 11"

最后再来复习回忆一下：

1、什么是词素？
2、在Token流中找到结构的过程称为什么，或者换句话说，识别该Token流中的特定短语的过程叫什么？
3、解释器（编译器）中负责解析(parsing)的部分叫什么？

希望您喜欢今天的资料，在下一篇文章中，将扩展计算器以处理更复杂的算术表达式，敬请关注。

以上是关于实现一个简单的解释器的主要内容，如果未能解决你的问题，请参考以下文章