使用Scala基于词法单元的解析器定制EBNF范式文法解析
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用Scala基于词法单元的解析器定制EBNF范式文法解析相关的知识,希望对你有一定的参考价值。
一、前言
近期在做Oracle迁移到Spark平台的项目上遇到了一些平台公式翻译为SparkSQL(on Hive)的需求,而Spark采用亲妈语言Scala进行开发。分析过大概需求过后,拟使用编译原理中的EBNF范式模式,进行基于词法的文法解析。于是拟采用传统的正则词法解析到EBNF文法解析的套路来实现,直到发现了StandardTokenParsers
这个Scala基于词法单元的解析器类。
二、平台公式及翻译后的SparkSQL
-
1 if(XX1_m001[D003]="邢おb7肮α?薇"|| XX1_m001[H003]<"2")&& XX1_m001[D005]!="wed" then XX1_m001[H022,COUNT]
这里面字段值”邢おb7肮α?薇”为这个的目的是为了测试各种字符集是否都能匹配满足。
那么对应的SparkSQL应该是这个样子的,由于是使用的Hive on Spark,因而长得跟Oracle的SQL语句差不多:
-
1 SELECT COUNT(H022) FROM XX1_m001 WHERE (XX1_m001.D003=‘邢おb7肮α?薇‘ OR XX1_m001.H003<‘2‘) AND XX1_m001.D005<‘wed’
总体而言比较简单,因为我只是想在这里做一个Demo。
三、平台公式的EBNF范式及词法解析设计
-
1 expr-condition ::= tableName "[" valueName "]" comparator Condition 2 expr-front ::= expr-condition (("&&"|"||")expr-front)* 3 expr-back ::= tableName "[" valueName ","operator"]" 4 expr ::="if" expr-front "then" expr-back
其中词法定义如下
-
1 operator=>[SUM,COUNT] 2 tableName,valueName =>ident #ident为关键字 3 comparator =>["=",">=","<=",">","<","!="] 4 Condition=> stringLit #stringLit为字符串常量
四、使用Scala基于词法单元的解析器解析上述EBNF文法
StandardTokenParsers
这个类的,该类提供了很方便的解析函数,以及词法集合。我们可以通过使用lexical.delimiters
列表来存放在文法翻译器执行过程中遇到的分隔符,使用lexical.reserved
列表来存放执行过程中的关键字。
比如,我们参照平台公式,看到"=",">=","<=",">","<","!=","&&","||","[","]",",","(",")"
这些都是分隔符,其实我们也可以把"=",">=","<=",">","<","!=","&&","||"
当做是关键字,但是我习惯上将带有英文字母的单词作为关键字处理。因而,这里的关键字集合便是"if","then","SUM","COUNT"
这些。
表现在代码中是酱紫的:
-
1 lexical.delimiters +=("=",">=","<=",">","<","!=","&&","||","[","]",",","(",")") 2 lexical.reserved +=("if","then","SUM","COUNT")
是不是so easy~。我们再来看一下如何使用基于词法单元的解析器解析前面我们设计的EBNF文法呢。我在这里先上代码:
-
1 classExprParsre extends StandardTokenParsers{ 2 lexical.delimiters +=("=",">=","<=",">","<","!=","&&","||","[","]",",","(",")") 3 lexical.reserved +=("if","then","SUM","COUNT") 4 def expr:Parser[String]="if"~ expr_front ~"then"~ expr_back ^^{ 5 case "if"~ exp1 ~"then"~ exp2 => exp2 +" WHERE "+exp1 6 } 7 def expr_priority:Parser[String]= opt("(")~ expr_condition ~ opt(")")^^{ 8 case Some("(")~ conditions ~Some(")")=>"("+ conditions +")" 9 case Some("(")~ conditions ~None=>"("+ conditions 10 case None~ conditions ~Some(")")=> conditions +")" 11 case None~ conditions ~None=> conditions 12 } 13 def expr_condition:Parser[String]= ident ~"["~ ident ~"]"~("="|">="|"<="|">"|"<"|"!=")~ stringLit ^^{ 14 case ident1~"["~ident2~"]"~"="~stringList => ident1 +"."+ ident2 +"=‘"+ stringList +"‘" 15 case ident1~"["~ident2~"]"~">="~stringList => ident1 +"."+ ident2 +">=‘"+ stringList +"‘" 16 case ident1~"["~ident2~"]"~"<="~stringList => ident1 +"."+ ident2 +"<=‘"+ stringList +"‘" 17 case ident1~"["~ident2~"]"~">"~stringList => ident1 +"."+ ident2 +">‘"+ stringList +"‘" 18 case ident1~"["~ident2~"]"~"<"~stringList => ident1 +"."+ ident2 +"<‘"+ stringList +"‘" 19 case ident1~"["~ident2~"]"~"!="~stringList => ident1 +"."+ ident2 +"!=‘"+ stringList +"‘" 20 } 21 def comparator:Parser[String]=("&&"|"||")^^{ 22 case"&&"=>" AND " 23 case"||"=>" OR " 24 } 25 def expr_front:Parser[String]= expr_priority ~ rep(comparator ~ expr_priority)^^{ 26 case exp1 ~ exp2 => exp1 + exp2.map(x =>{x._1 +" "+ x._2}).mkString(" ") 27 } 28 def expr_back:Parser[String]= ident ~"["~ ident ~","~("SUM"|"COUNT")~"]"^^{ 29 case ident1~"["~ident2~","~"COUNT"~"]"=>"SELECT COUNT("+ ident2.toString()+") FROM "+ ident1.toString() 30 case ident1~"["~ident2~","~"SUM"~"]"=>"SELECT SUM("+ ident2.toString()+") FROM "+ ident1.toString() 31 } 32 def parserAll[T]( p :Parser[T], input :String)={ 33 phrase(p)(new lexical.Scanner(input)) 34 } 35 }
链接地址: http://zhkmxx930.leanote.com/post/%E4%BD%BF%E7%94%A8Scala%E5%9F%BA%E4%BA%8E%E8%AF%8D%E6%B3%95%E5%8D%95%E5%85%83%E7%9A%84%E8%A7%A3%E6%9E%90%E5%99%A8%E5%AE%9A%E5%88%B6EBNF%E8%8C%83%E5%BC%8F%E6%96%87%E6%B3%95%E8%A7%A3%E6%9E%90
以上是关于使用Scala基于词法单元的解析器定制EBNF范式文法解析的主要内容,如果未能解决你的问题,请参考以下文章