编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 3.）（python/c/c++版）（笔记）

Posted 2021-08-14 Dontla

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 3.）（python/c/c++版）（笔记）相关的知识，希望对你有一定的参考价值。

【编译原理】让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 3.）

文章目录

到目前为止，您已经学习了如何解释对两个整数进行加减运算的算术表达式，例如“7 + 3”或“12 - 9”。今天我将讨论如何解析（识别）和解释包含任意数量的加号或减号运算符的算术表达式，例如“7 - 3 + 2 - 1”。

从图形上看，本文中的算术表达式可以用以下语法图表示：
在这里插入图片描述
什么是语法图（syntax diagram）？一个语法图是一种编程语言的语法规则的图示。基本上，语法图直观地向您展示了您的编程语言中允许哪些语句，哪些不允许。

语法图很容易阅读：只需按照箭头指示的路径进行操作即可。一些路径表示选择（choices）。并且一些路径表示循环（loops）。

您可以阅读上面的语法图如下：一个术语（term）可选地后跟一个加号或减号，然后是另一个术语，而另一个术语又可选地后跟一个加号或减号，然后是另一个术语，依此类推。你得到了图片，字面意思。您可能想知道什么是“术语”。就本文而言，“术语”只是一个整数。

语法图有两个主要目的：

它们以图形方式表示编程语言的规范（语法）。
它们可用于帮助您编写解析器——您可以按照简单的规则将图表映射到代码。

您已经了解到在标记流中识别短语的过程称为解析。执行该工作的解释器或编译器的部分称为解析器。解析也称为语法分析，解析器也被恰当地称为，你猜对了，语法分析器。

根据上面的语法图，以下所有算术表达式都是有效的：

3
3 + 4
7 - 3 + 2 - 1

由于不同编程语言中算术表达式的语法规则非常相似，我们可以使用 Python shell 来“测试”我们的语法图。启动你的 Python shell 并亲眼看看：

>>> 3 
3 
>>> 3 + 4 
7 
>>> 7 - 3 + 2 - 1 
5

这里没有惊喜。

表达式“3 + ”不是有效的算术表达式，因为根据语法图，加号后面必须跟一个术语（整数），否则就是语法错误。再一次，用 Python shell 尝试一下，亲眼看看：

>>> 3 +
  文件"<stdin>"，第1
    行3 +
      ^
语法错误：无效语法

能够使用 Python shell 进行一些测试是很棒的，但是让我们将上面的语法图映射到代码并使用我们自己的解释器进行测试，好吗？

您从之前的文章（第 1 部分和第 2 部分）中知道expr方法是我们的解析器和解释器所在的地方。同样，解析器只是识别结构以确保它符合某些规范，并且一旦解析器成功识别（解析）它，解释器就会实际评估表达式。

以下代码片段显示了与图表对应的解析器代码。语法图（term）中的矩形框变成了解析整数的term方法，而expr方法只是遵循语法图流程：

def term(self):
    self.eat(INTEGER)

def expr(self):
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            self.term()

可以看到expr首先调用了term方法。然后expr方法有一个可以执行零次或多次的while循环。在循环内部，解析器根据标记（无论是加号还是减号）做出选择。花一些时间向自己证明上面的代码确实遵循算术表达式的语法图流程。

解析器本身并不解释任何东西：如果它识别出一个表达式，它就会保持沉默，如果没有，它会抛出一个语法错误。让我们修改expr方法并添加解释器代码：

def term(self):
    """Return an INTEGER token value"""
    token = self.current_token
    self.eat(INTEGER)
    return token.value

def expr(self):
    """Parser / Interpreter """
    # set current token to the first token taken from the input
    self.current_token = self.get_next_token()

    result = self.term()
    while self.current_token.type in (PLUS, MINUS):
        token = self.current_token
        if token.type == PLUS:
            self.eat(PLUS)
            result = result + self.term()
        elif token.type == MINUS:
            self.eat(MINUS)
            result = result - self.term()

    return result

由于解释器需要对表达式求值，因此修改term方法以返回整数值，修改expr方法以在适当的位置执行加减运算并返回解释结果。尽管代码非常简单，但我还是建议花一些时间研究它。

现在开始动起来，看看解释器的完整代码，好吗？

这是新版本计算器的源代码，它可以处理包含整数和任意数量的加减运算符的有效算术表达式：

python代码calc3.py

# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS, '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5 + 3", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    ##########################################################
    # Lexer code                                             #
    ##########################################################
    def error(self):
        raise Exception('Invalid syntax')

    def advance(self):
        """Advance the `pos` pointer and set the `current_char` variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens. One token at a time.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    ##########################################################
    # Parser / Interpreter code                              #
    ##########################################################
    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def term(self):
        """Return an INTEGER token value."""
        token = self.current_token
        self.eat(INTEGER)
        return token.value

    def expr(self):
        """Arithmetic expression parser / interpreter."""
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        result = self.term()
        while self.current_token.type in (PLUS, MINUS):
            token = self.current_token
            if token.type == PLUS:
                self.eat(PLUS)
                result = result + self.term()
            elif token.type == MINUS:
                self.eat(MINUS)
                result = result - self.term()

        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

运行结果：

D:\\python_virtualenv\\my_flask\\Scripts\\python.exe C:/Users/Administrator/Desktop/编译原理/python/calc3.py
calc> 3 +  4
7
calc> 3-4
-1
calc>   3+  3   -5   +4  
5

C语言代码（calc3.cpp）（有bug，最后面数字后不能有空格，否则报错）

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include<math.h>

#define flag_digital 0
#define flag_plus 1
#define flag_minus 2
#define flag_EOF 3

struct Token
{
	int type;
	int value;
};

struct Interpreter
{
	char* text;
	int pos;
	struct Token current_token;
};

void error() {
	printf("输入非法！\\n");
	exit(-1);
}

void skip_whitespace(struct Interpreter* pipt) {
	while (pipt->text[pipt->pos] == ' ') {
		pipt->pos++;
	}
}

//判断Interpreter中当前pos是不是数字
int is_integer(char c) {
	if (c >= '0' && c <= '9')
		return 1;
	else
		return 0;
}

void advance(struct Interpreter* pipt) {
	pipt->pos++;
}

char current_char(struct Interpreter* pipt) {
	return(pipt->text[pipt->pos]);
}

//获取数字token的数值（把数字字符数组转换为数字）
int integer(struct Interpreter* pipt) {
	char temp[20];
	int i = 0;
	while (is_integer(pipt->text[pipt->pos])) {
		temp[i] = pipt->text[pipt->pos];
		i++;
		advance(pipt);
	}
	int result = 0;
	int j = 0;
	int len = i;
	while (j < len) {
		result += (temp[j] - '0') * pow(10, len - j - 1);
		j++;
	}
	return result;
}

void get_next_token(struct Interpreter* pipt) {
	if (pipt->pos > (strlen(pipt->text) - 1)) {
		pipt->current_token = { flag_EOF, NULL };
		return;
	}
	if (current_char(pipt) == ' ')
		skip_whitespace(pipt);
	if (is_integer(current_char(pipt))) {
		pipt->current_token = { flag_digital, integer(pipt) };
		return;
	}
	if (current_char(pipt) == '+') {
		pipt->current_token = { flag_plus, NULL };
		pipt->pos++;
		return;
	}
	if (current_char(pipt) == '-') {
		pipt->current_token = { flag_minus, NULL };
		pipt->pos++;
		return;
	}
	error();//如果都不是以上的字符，则报错并退出程序
}



int eat(struct Interpreter* pipt, int type) {
	int current_token_value = pipt->current_token.value;
	if (pipt->current_token.type == type) {
		get_next_token(pipt);
		return current_token_value;
	}
	else {
		error();
	}
}

int term(struct Interpreter* pipt) {
	return eat(pipt, flag_digital);
}

int expr(char* text) {
	struct Interpreter ipt = { text, 0 };
	get_next_token(&ipt);
	int result;
	result = term(&ipt);
	while (true) {
		int token_type = ipt.current_token.type;
		if (token_type == flag_plus) {
			eat(&ipt, flag_plus);
			result = result + term(&ipt);
		}else if (token_type == flag_minus) {
			eat(&ipt, flag_minus);
			result = result - term(&ipt);
		}
		else {
			return result;
		}
	}
}

int main() {
	char text[50];
	while (1)
	{
		printf("请输入算式：\\n");
		//scanf_s("%s", text, sizeof(text));//sanf没法输入空格？
		int i = 0;
		while ((text[i] = getchar()) != '\\n') {
			//putchar(text[i]);
			i++;
		}
		text[i] = '\\0';
		int result = expr(text);
		printf("= %d\\n\\n", result);
	}
	return 0;
}

运行结果：（最后面有空格会报错！）

请输入算式：
4+4
= 8

请输入算式：
3-4
= -1

请输入算式：
  3+4  -4
= 3

请输入算式：
4   +  4   -  4
= 4

请输入算式：

C语言代码（修复版，跳空格放在判断结束符前执行，所以用户输入最后面可包含空格）

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include<math.h>

#define flag_digital 0以上是关于编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 3.）（python/c/c++版）（笔记）的主要内容，如果未能解决你的问题，请参考以下文章 
 编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 6.）（python/c/c++版）（笔记）
 编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 4.）（python/c/c++版）（笔记）
 编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 3.）（python/c/c++版）（笔记）
 编译原理让我们来构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 5.）（python/c/c++版）（笔记）Lexer词法分析程序
 编译原理构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 8.）（笔记）一元运算符正负（+，-）
 编译原理构建一个简单的解释器（Let’s Build A Simple Interpreter. Part 9.）（笔记）语法分析（未完，先搁置了！）