编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 1.)(python/c/c++版)(笔记)
Posted Dontla
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 1.)(python/c/c++版)(笔记)相关的知识,希望对你有一定的参考价值。
原文:Let’s Build A Simple Interpreter. Part 1.
文章目录
【编译原理】让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 1.)(python/c/c++版)(part 1)
pascal代码,我们要为这种代码做一个解释器
program factorial;
function factorial(n: integer): longint;
begin
if n = 0 then
factorial := 1
else
factorial := n * factorial(n - 1);
end;
var
n: integer;
begin
for n := 0 to 16 do
writeln(n, '! = ', factorial(n));
end.
首先我们先来实现pascal编译器的一个小功能,两个数的加法,我们准备用python实现,但是如果你想用其他语言实现也是可行的,这是它的代码:
# -*- coding: utf-8 -*-
"""
@File : 1.py
@Time : 2021/5/19 14:44
@Author : Dontla
@Email : sxana@qq.com
@Software: PyCharm
"""
# Token types
#
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, EOF = 'INTEGER', 'PLUS', 'EOF'
class Token(object):
def __init__(self, type_, value):
# token type: INTEGER, PLUS, or EOF
self.type = type_
# token value: 0, 1, 2. 3, 4, 5, 6, 7, 8, 9, '+', or None
self.value = value
def __str__(self):
"""String representation of the class instance.
Examples:
Token(INTEGER, 3)
Token(PLUS '+')
"""
return 'Token({type}, {value})'.format(
type=self.type,
value=repr(self.value)
)
def __repr__(self):
return self.__str__()
class Interpreter(object):
def __init__(self, text):
# 客户端字符串输入, 比如 "3+5"
self.text = text
# self.pos is an index into self.text
self.pos = 0
# current token instance(当前标记实例)
self.current_token = None
def error(self):
raise Exception('Error parsing input') # 语法分析输入出错
def get_next_token(self):
"""Lexical analyzer (also known as scanner or tokenizer)
This method is responsible for breaking a sentence
apart into tokens. One token at a time.
词法分析器(也称为扫描器scanner或标记器tokenizer)
这个方法负责将一个句子分解成标记tokens。一次一个标记
"""
text = self.text
# is self.pos index past the end of the self.text ?
# if so, then return EOF token because there is no more
# input left to convert into tokens
# self.pos索引是否超过self.text的结尾?
# 如果是,则返回EOF标记,因为没有更多的标记
# 向左输入以转换为标记
if self.pos > len(text) - 1:
return Token(EOF, None)
# get a character at the position self.pos and decide
# what token to create based on the single character
# 在self.pos位置获取一个字符,并根据单个字符决定要创建的标记
current_char = text[self.pos]
# if the character is a digit then convert it to
# integer, create an INTEGER token, increment self.pos
# index to point to the next character after the digit,
# and return the INTEGER token
# 如果字符是数字,则将其转换为整型,创建整型标记,增加self.pos索引以指向数字后面的下一个字符,然后返回整型标记
if current_char.isdigit(): # isdigit()函数,全是数字返回True,否则返回False
token = Token(INTEGER, int(current_char)) # 创建一个token
self.pos += 1
return token
if current_char == '+':
token = Token(PLUS, current_char)
self.pos += 1
return token
self.error()
def eat(self, token_type):
# compare the current token type with the passed token
# type and if they match then "eat" the current token
# and assign the next token to the self.current_token,
# otherwise raise an exception.
if self.current_token.type == token_type:
self.current_token = self.get_next_token()
else:
self.error()
def expr(self):
"""expr -> INTEGER PLUS INTEGER"""
# set current token to the first token taken from the input
self.current_token = self.get_next_token()
# we expect the current token to be a single-digit integer
left = self.current_token
self.eat(INTEGER)
# we expect the current token to be a '+' token
op = self.current_token
self.eat(PLUS)
# we expect the current token to be a single-digit integer
right = self.current_token
self.eat(INTEGER)
# after the above call the self.current_token is set to
# EOF token
# at this point INTEGER PLUS INTEGER sequence of tokens
# has been successfully found and the method can just
# return the result of adding two integers, thus
# effectively interpreting client input
result = left.value + right.value
return result
def main():
while True:
try:
# 要在Python3下运行,请将“raw_input”调用替换为“input”
# text = raw_input('calc> ')
text = input('calc> ') # 获取键盘输入,参数为提示信息
except EOFError: # 不知是什么异常
break
if not text:
continue
interpreter = Interpreter(text)
result = interpreter.expr()
print(result)
if __name__ == '__main__':
main()
运行结果:
D:\\python_virtualenv\\my_flask\\Scripts\\python.exe C:/Users/Administrator/Desktop/新建文件夹/1.py
calc> 1+2
3
calc>
将输入字符串分解为token标记的过程称为词法分析(lexical analysis),词法分析器( lexical analyzer或 lexer )也叫扫描器(scanner )或标记器(tokenizer)。
>>> from calc1 import Interpreter
>>>
>>> interpreter = Interpreter('3+5')
>>> interpreter.get_next_token()
Token(INTEGER, 3)
>>>
>>> interpreter.get_next_token()
Token(PLUS, '+')
>>>
>>> interpreter.get_next_token()
Token(INTEGER, 5)
>>>
>>> interpreter.get_next_token()
Token(EOF, None)
>>>
让我们回顾一下您的解释器如何评估算术表达式:
- 解释器接受一个输入字符串,比如说“3+5”
- 解释器调用expr方法在词法分析器get_next_token返回的标记流中查找结构。它试图找到的结构是INTEGER PLUS INTEGER的形式。在确认结构后,它通过添加两个INTEGER标记的值来解释输入,因为此时解释器很清楚它需要做的是添加两个整数 3 和 5。
检查理解:
什么是解释器interpreter?
什么是编译器compiler?
解释器和编译器有什么区别?
什么是标记token?
将输入分解为标记的过程的名称是什么?
进行词法分析lexical analysis的解释器的部分是什么?
解释器或编译器的那部分的其他常见名称是什么?
用c/c++实现
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
struct Interpreter
{
char* text;
int pos;
struct Token (*get_next_token)(struct Interpreter*);
};
struct Token
{
int type;
char value;
};
struct Token get_next_token(struct Interpreter* pipt) {
if (pipt->pos > (strlen(pipt->text)-1)) {
struct Token token = {3, '\\0'};//3表示EOF,2表示+,1表示数字
return token;
}
char current_char = pipt->text[pipt->pos];
if (current_char>='0' && current_char<='9') {
struct Token token = {1, current_char};
pipt->pos++;
return token;
}
if (current_char == '+') {
struct Token token = { 2, current_char };
pipt->pos++;
return token;
}
printf("输入非法!\\n");
exit(-1);//如果都不是以上的字符,则报错并退出程序
}
char eat(struct Token* pcurrent_token, struct Interpreter* pipt, int type) {
char former_token_value = pcurrent_token->value;
if (pcurrent_token->type == type) {
*pcurrent_token = pipt->get_next_token(pipt);
}
else {
printf("输入非法!\\n");
exit(-1);
}
return former_token_value;
}
int expr(char* text) {
struct Interpreter ipt = {text, 0, get_next_token};
struct Token current_token = ipt.get_next_token(&ipt);
char temp;
temp = eat(¤t_token, &ipt, 1);//断言第一个字符是数字
int left = temp - '0';
eat(¤t_token, &ipt, 2);//断言第三个字符是加号
temp = eat(¤t_token, &ipt, 1);//断言第三个字符是数字
int right = temp - '0';
int result = left + right;
return result;
}
int main() {
char text[10];
while (1)
{
printf("请输入算式:\\n");
scanf_s("%s", text, sizeof(text));
int result = expr(text);
printf("= %d\\n\\n", result);
}
return 0;
}
运行结果:
请输入算式:
2+8
= 10
请输入算式:
1+5
= 6
请输入算式:
3+4
= 7
请输入算式:
以上是关于编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 1.)(python/c/c++版)(笔记)的主要内容,如果未能解决你的问题,请参考以下文章
编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 6.)(python/c/c++版)(笔记)
编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 4.)(python/c/c++版)(笔记)
编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 3.)(python/c/c++版)(笔记)
编译原理让我们来构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 5.)(python/c/c++版)(笔记)Lexer词法分析程序
编译原理构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 8.)(笔记)一元运算符正负(+,-)
编译原理构建一个简单的解释器(Let’s Build A Simple Interpreter. Part 9.)(笔记)语法分析(未完,先搁置了!)