Calcite分析 - Rule
Posted fxjwind
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Calcite分析 - Rule相关的知识,希望对你有一定的参考价值。
Calcite源码分析,参考:
http://matt33.com/2019/03/07/apache-calcite-process-flow/
https://matt33.com/2019/03/17/apache-calcite-planner/
Rule作为Calcite查询优化的核心,
具体看几个有代表性的Rule,看看是如何实现的
最简单的例子,Join结合律,JoinAssociateRule
首先所有的Rule都继承RelOptRule类
/** * Planner rule that changes a join based on the associativity rule. * * <p>((a JOIN b) JOIN c) → (a JOIN (b JOIN c))</p> * * <p>We do not need a rule to convert (a JOIN (b JOIN c)) → * ((a JOIN b) JOIN c) because we have * @link JoinCommuteRule. * * @see JoinCommuteRule */ public class JoinAssociateRule extends RelOptRule
RelOptRule用于transform expression
它维护的Operand tree表明,该rule可以适用于何种树结构,具体看下面的例子
/** * A <code>RelOptRule</code> transforms an expression into another. It has a * list of @link RelOptRuleOperands, which determine whether the rule can be * applied to a particular section of the tree. * * <p>The optimizer figures out which rules are applicable, then calls * @link #onMatch on each of them.</p> */ public abstract class RelOptRule /** * Root of operand tree. */ private final RelOptRuleOperand operand; /** Factory for a builder for relational expressions. * * <p>The actual builder is available via @link RelOptRuleCall#builder(). */ public final RelBuilderFactory relBuilderFactory; /** * Flattened list of operands. */ public final List<RelOptRuleOperand> operands; //~ Constructors ----------------------------------------------------------- /** * Creates a rule with an explicit description. * * @param operand root operand, must not be null * @param description Description, or null to guess description * @param relBuilderFactory Builder for relational expressions */ public RelOptRule(RelOptRuleOperand operand, RelBuilderFactory relBuilderFactory, String description) this.operand = Objects.requireNonNull(operand); this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description; this.operands = flattenOperands(operand); assignSolveOrder();
比如对于Join结合律,调用super,即RelOptRule的构造函数
/** * Creates a JoinAssociateRule. */ public JoinAssociateRule(RelBuilderFactory relBuilderFactory) super( operand(Join.class, operand(Join.class, any()), operand(RelSubset.class, any())), relBuilderFactory, null);
operand也是一个树形结构,Top的Operand的类型是Join,他有两个children,其中一个也是join,另一个是RelSubset
他们的children是any()
/** * Creates a list of child operands that signifies that the operand matches * any number of child relational expressions. * * @return List of child operands that signifies that the operand matches * any number of child relational expressions */ public static RelOptRuleOperandChildren any() return RelOptRuleOperandChildren.ANY_CHILDREN;
然后要理解Rule如果被使用的,来看看HepPlaner里面的代码
因为HepPlaner逻辑比较简单,就是遍历所有的HepRelVertex,看看RelOptRule是否可以匹配,如果匹配就应用Rule
所以会调用到applyRule,输入包含Rule和Vertex
private HepRelVertex applyRule( RelOptRule rule, HepRelVertex vertex, boolean forceConversions) final List<RelNode> bindings = new ArrayList<>(); final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>(); boolean match = matchOperands( rule.getOperand(), vertex.getCurrentRel(), bindings, nodeChildren); if (!match) return null; HepRuleCall call = new HepRuleCall( this, rule.getOperand(), bindings.toArray(new RelNode[0]), nodeChildren, parents); // Allow the rule to apply its own side-conditions. if (!rule.matches(call)) return null; fireRule(call); if (!call.getResults().isEmpty()) return applyTransformationResults( vertex, call, parentTrait); return null;
1. 首先调用,matchOperands,看看是否match,
private boolean matchOperands( RelOptRuleOperand operand, //Rule中的Operand RelNode rel, //vertex中的RelNode List<RelNode> bindings, Map<RelNode, List<RelNode>> nodeChildren) if (!operand.matches(rel)) //先比较top node和operand是否match return false; bindings.add(rel); //如果match,把top relnode加入bindings //接着来比较child是否match //child有几种类型: //Any,随意,直接返回true //Unordered,无序,对于每个operand只要有任何一个child RelNode可满足即可 //Default,Some,Operand和RelNode严格有序匹配 List<HepRelVertex> childRels = (List) rel.getInputs(); switch (operand.childPolicy) case ANY: return true; case UNORDERED: // For each operand, at least one child must match. If // matchAnyChildren, usually there‘s just one operand. for (RelOptRuleOperand childOperand : operand.getChildOperands()) boolean match = false; for (HepRelVertex childRel : childRels) match = matchOperands( //递归调用matchOperands childOperand, childRel.getCurrentRel(), bindings, nodeChildren); if (match) break; if (!match) return false; final List<RelNode> children = new ArrayList<>(childRels.size()); for (HepRelVertex childRel : childRels) children.add(childRel.getCurrentRel()); nodeChildren.put(rel, children); return true; default: int n = operand.getChildOperands().size(); if (childRels.size() < n) return false; //一一按顺序对应match for (Pair<HepRelVertex, RelOptRuleOperand> pair : Pair.zip(childRels, operand.getChildOperands())) boolean match = matchOperands( pair.right, pair.left.getCurrentRel(), bindings, nodeChildren); if (!match) return false; return true;
逻辑是先比较自身,然后再递归比较children
比较函数,
可以看出,从类型,Trait,Predicate上来比较,是否匹配
/** * Returns whether a relational expression matches this operand. It must be * of the right class and trait. */ public boolean matches(RelNode rel) if (!clazz.isInstance(rel)) return false; if ((trait != null) && !rel.getTraitSet().contains(trait)) return false; return predicate.test(rel);
child的类型分为,
/** * Policy by which operands will be matched by relational expressions with * any number of children. */ public enum RelOptRuleOperandChildPolicy /** * Signifies that operand can have any number of children. */ ANY, /** * Signifies that operand has no children. Therefore it matches a * leaf node, such as a table scan or VALUES operator. * * <p>@code RelOptRuleOperand(Foo.class, NONE) is equivalent to * @code RelOptRuleOperand(Foo.class) but we prefer the former because * it is more explicit.</p> */ LEAF, /** * Signifies that the operand‘s children must precisely match its * child operands, in order. */ SOME, /** * Signifies that the rule matches any one of its parents‘ children. * The parent may have one or more children. */ UNORDERED,
不同的类型按照注释中的规则进行match
2. 如果match,继续会把当前结果封装成HepRuleCall,继承自RelOptRuleCall
是RelOptRule的一次invocation,即会记录调用中用到的上下文数据和结果
/** * A <code>RelOptRuleCall</code> is an invocation of a @link RelOptRule with a * set of @link RelNode relational expressions as arguments. */ public abstract class RelOptRuleCall /** * Generator for @link #id values. */ private static int nextId = 0; //~ Instance fields -------------------------------------------------------- public final int id; protected final RelOptRuleOperand operand0; //Rule的Operand树的root protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs public final RelOptRule rule; public final RelNode[] rels; //match到的所有RelNodes private final RelOptPlanner planner; private final List<RelNode> parents; //对于Top RelNodes而言的parents
3. 调用rule.matches(call)
默认实现是,return true,即不检查,这里允许Rule加一下特定的match检测
4. 调用fireRule(call)
而其中主要是调用,ruleCall.getRule().onMatch(ruleCall);
所以做的事情是需要在Rule的onMatch里面定义的
下面看下JoinAssociateRule的实现,
public void onMatch(final RelOptRuleCall call) final Join topJoin = call.rel(0); final Join bottomJoin = call.rel(1); final RelNode relA = bottomJoin.getLeft(); final RelNode relB = bottomJoin.getRight(); final RelSubset relC = call.rel(2); final RelOptCluster cluster = topJoin.getCluster(); final RexBuilder rexBuilder = cluster.getRexBuilder(); if (relC.getConvention() != relA.getConvention()) // relC could have any trait-set. But if we‘re matching say // EnumerableConvention, we‘re only interested in enumerable subsets. return; // topJoin // / // bottomJoin C // / // A B final int aCount = relA.getRowType().getFieldCount(); final int bCount = relB.getRowType().getFieldCount(); final int cCount = relC.getRowType().getFieldCount(); final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount); final ImmutableBitSet bBitSet = ImmutableBitSet.range(aCount, aCount + bCount);
这部分都是在初始化,上面好理解,下面算这些count是干啥?
其实是在算RexInputRef,
RexNode和RelNode的共同点是,他们都代表一个树
差别是他们代表的东西不同,RelNode代表关系代数算子组成的数,而RexNode表示表达式树
比如RexLiteral表示常量,RexVariable表示变量,RexCall表示操作来连接Literal和Variable
RexVariable往往表示输入的某个field,为了效率,这里只会记录field的id,即这个RexInputRef
/** * Variable which references a field of an input relational expression. * * <p>Fields of the input are 0-based. If there is more than one input, they are * numbered consecutively. For example, if the inputs to a join are</p> * * <ul> * <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li> * <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li> * </ul> * * <p>then the fields are:</p> * * <ul> * <li>Field #0: EMPNO</li> * <li>Field #1: ENAME</li> * <li>Field #2: DEPTNO (from EMP)</li> * <li>Field #3: DEPTNO2 (from DEPT)</li> * <li>Field #4: DNAME</li> * </ul> * * <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the * field DEPTNO2.</p> */ public class RexInputRef extends RexSlot
而RexInputRef的index不是固定的,是按顺序排的,
所以按上面把fieldcount都算出来,就可以排出对应的index
接着做的一堆都是为了调整conditions的位置和相对应的index
// Goal is to transform to // // newTopJoin // / // A newBottomJoin // / // B C // Split the condition of topJoin and bottomJoin into a conjunctions. A // condition can be pushed down if it does not use columns from A. final List<RexNode> top = new ArrayList<>(); final List<RexNode> bottom = new ArrayList<>(); JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom); JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top, bottom); // Mapping for moving conditions from topJoin or bottomJoin to // newBottomJoin. // target: | B | C | // source: | A | B | C | final Mappings.TargetMapping bottomMapping = Mappings.createShiftMapping( aCount + bCount + cCount, 0, aCount, bCount, bCount, aCount + bCount, cCount); final List<RexNode> newBottomList = new ArrayList<>(); new RexPermuteInputsShuttle(bottomMapping, relB, relC) .visitList(bottom, newBottomList); RexNode newBottomCondition = RexUtil.composeConjunction(rexBuilder, newBottomList);
1. 把原先TopJoin和BottomJoin里面的conditions,分成和A相关的和无关的
因为调整完以后,和A相关的conditions要放到TopJoin,而和A无关的需要放到BottomJoin
aBitSet里面记录了所有A的fields的index,
/** * Splits a condition into conjunctions that do or do not intersect with * a given bit set. */ static void split( RexNode condition, ImmutableBitSet bitSet, List<RexNode> intersecting, List<RexNode> nonIntersecting) for (RexNode node : RelOptUtil.conjunctions(condition)) //把conjunction的条件拆分开 ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields if (bitSet.intersects(inputBitSet)) intersecting.add(node); else nonIntersecting.add(node);
如何找出condition所用到的fields?
如下,在递归的过程做回把所有碰到的inputRef记录下来
/** * Returns a bit set describing the inputs used by an expression. */ public static ImmutableBitSet bits(RexNode node) return analyze(node).inputBitSet.build(); /** Returns an input finder that has analyzed a given expression. */ public static InputFinder analyze(RexNode node) final InputFinder inputFinder = new InputFinder(); node.accept(inputFinder); //Visitor模式 return inputFinder; //如果RexNode是InputRef,就记录下该Ref的index public Void visitInputRef(RexInputRef inputRef) inputBitSet.set(inputRef.getIndex()); return null; //如果RexNode是Call,继续递归accept该Call的相关Operands public R visitCall(RexCall call) if (!deep) return null; R r = null; for (RexNode operand : call.operands) r = operand.accept(this); return r;
2. 由于树结构变了,会导致整个field的index发生改变
对于原先的,bottomJoin,A的field从0开始,B是从aCount开始
而对于变换后的,BottomJoin,B的field从0开始,C是从bCount开始
所以需要完成Ref的修正,
createShiftMapping,生成的mapping,包含原先的index,和当前新的index
第一个参数是index的范围,一共aCount + bCount + cCount,所以index需要小于这个值
然后,只有B,C的index需要调整,A的index不需要调整
其中,B的原先是从aCount开始,当前从0开始,一共bCount个fields;C的原先是从aCount + bCount开始,当前从bCount开始,一共cCount个fields
接着,RexPermuteInputsShuttle做具体的更新
/** * Shuttle which applies a permutation to its input fields. * * @see RexPermutationShuttle * @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode) */ public class RexPermuteInputsShuttle extends RexShuttle //~ Instance fields -------------------------------------------------------- private final Mappings.TargetMapping mapping; private final ImmutableList<RelDataTypeField> fields; //对于每个InputRef,根据Mapping更新index @Override public RexNode visitInputRef(RexInputRef local) final int index = local.getIndex(); int target = mapping.getTarget(index); return new RexInputRef( target, local.getType());
最后,构建新的join,
RexNode newBottomCondition = RexUtil.composeConjunction(rexBuilder, newBottomList); final Join newBottomJoin = bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB, relC, JoinRelType.INNER, false); // Condition for newTopJoin consists of pieces from bottomJoin and topJoin. // Field ordinals do not need to be changed. RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top); @SuppressWarnings("SuspiciousNameCombination") final Join newTopJoin = topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA, newBottomJoin, JoinRelType.INNER, false); call.transformTo(newTopJoin);
以上是关于Calcite分析 - Rule的主要内容,如果未能解决你的问题,请参考以下文章
95-910-330-源码-FlinkSQL-Calcite-Flink结合Calcite