Calcite分析 - Rule

Posted fxjwind

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Calcite分析 - Rule相关的知识,希望对你有一定的参考价值。

Calcite源码分析,参考:

http://matt33.com/2019/03/07/apache-calcite-process-flow/

https://matt33.com/2019/03/17/apache-calcite-planner/

 

Rule作为Calcite查询优化的核心,

具体看几个有代表性的Rule,看看是如何实现的

 

最简单的例子,Join结合律,JoinAssociateRule

首先所有的Rule都继承RelOptRule类

/**
 * Planner rule that changes a join based on the associativity rule.
 *
 * <p>((a JOIN b) JOIN c) &rarr; (a JOIN (b JOIN c))</p>
 *
 * <p>We do not need a rule to convert (a JOIN (b JOIN c)) &rarr;
 * ((a JOIN b) JOIN c) because we have
 * @link JoinCommuteRule.
 *
 * @see JoinCommuteRule
 */
public class JoinAssociateRule extends RelOptRule 

RelOptRule用于transform expression

它维护的Operand tree表明,该rule可以适用于何种树结构,具体看下面的例子

/**
 * A <code>RelOptRule</code> transforms an expression into another. It has a
 * list of @link RelOptRuleOperands, which determine whether the rule can be
 * applied to a particular section of the tree.
 *
 * <p>The optimizer figures out which rules are applicable, then calls
 * @link #onMatch on each of them.</p>
 */
public abstract class RelOptRule 
 
  /**
   * Root of operand tree.
   */
  private final RelOptRuleOperand operand;

  /** Factory for a builder for relational expressions.
   *
   * <p>The actual builder is available via @link RelOptRuleCall#builder(). */
  public final RelBuilderFactory relBuilderFactory;

  /**
   * Flattened list of operands.
   */
  public final List<RelOptRuleOperand> operands;

  //~ Constructors -----------------------------------------------------------

  /**
   * Creates a rule with an explicit description.
   *
   * @param operand     root operand, must not be null
   * @param description Description, or null to guess description
   * @param relBuilderFactory Builder for relational expressions
   */
  public RelOptRule(RelOptRuleOperand operand,
      RelBuilderFactory relBuilderFactory, String description) 
    this.operand = Objects.requireNonNull(operand);
    this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;
    this.operands = flattenOperands(operand);
    assignSolveOrder();
  

比如对于Join结合律,调用super,即RelOptRule的构造函数

  /**
   * Creates a JoinAssociateRule.
   */
  public JoinAssociateRule(RelBuilderFactory relBuilderFactory) 
    super(
        operand(Join.class,
            operand(Join.class, any()),
            operand(RelSubset.class, any())),
        relBuilderFactory, null);
  

operand也是一个树形结构,Top的Operand的类型是Join,他有两个children,其中一个也是join,另一个是RelSubset

他们的children是any()

  /**
   * Creates a list of child operands that signifies that the operand matches
   * any number of child relational expressions.
   *
   * @return List of child operands that signifies that the operand matches
   *   any number of child relational expressions
   */
  public static RelOptRuleOperandChildren any() 
    return RelOptRuleOperandChildren.ANY_CHILDREN;
  

 

然后要理解Rule如果被使用的,来看看HepPlaner里面的代码

因为HepPlaner逻辑比较简单,就是遍历所有的HepRelVertex,看看RelOptRule是否可以匹配,如果匹配就应用Rule

所以会调用到applyRule,输入包含Rule和Vertex

  private HepRelVertex applyRule(
      RelOptRule rule,
      HepRelVertex vertex,
      boolean forceConversions) 

    final List<RelNode> bindings = new ArrayList<>();
    final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
    boolean match =
        matchOperands(
            rule.getOperand(),
            vertex.getCurrentRel(),
            bindings,
            nodeChildren);

    if (!match) 
      return null;
    

    HepRuleCall call =
        new HepRuleCall(
            this,
            rule.getOperand(),
            bindings.toArray(new RelNode[0]),
            nodeChildren,
            parents);

    // Allow the rule to apply its own side-conditions.
    if (!rule.matches(call)) 
      return null;
    

    fireRule(call);

    if (!call.getResults().isEmpty()) 
      return applyTransformationResults(
          vertex,
          call,
          parentTrait);
    
    return null;
  

1. 首先调用,matchOperands,看看是否match,

  private boolean matchOperands(
      RelOptRuleOperand operand,  //Rule中的Operand
      RelNode rel,  //vertex中的RelNode
      List<RelNode> bindings,
      Map<RelNode, List<RelNode>> nodeChildren) 
    if (!operand.matches(rel))  //先比较top node和operand是否match
      return false;
    
    bindings.add(rel); //如果match,把top relnode加入bindings
    //接着来比较child是否match
    //child有几种类型:
    //Any,随意,直接返回true
    //Unordered,无序,对于每个operand只要有任何一个child RelNode可满足即可
    //Default,Some,Operand和RelNode严格有序匹配
    List<HepRelVertex> childRels = (List) rel.getInputs();
    switch (operand.childPolicy) 
    case ANY:
      return true;
    case UNORDERED:
      // For each operand, at least one child must match. If
      // matchAnyChildren, usually there‘s just one operand.
      for (RelOptRuleOperand childOperand : operand.getChildOperands()) 
        boolean match = false;
        for (HepRelVertex childRel : childRels) 
          match =
              matchOperands( //递归调用matchOperands
                  childOperand,
                  childRel.getCurrentRel(),
                  bindings,
                  nodeChildren);
          if (match) 
            break;
          
        
        if (!match) 
          return false;
        
      
      final List<RelNode> children = new ArrayList<>(childRels.size());
      for (HepRelVertex childRel : childRels) 
        children.add(childRel.getCurrentRel());
      
      nodeChildren.put(rel, children);
      return true;
    default:
      int n = operand.getChildOperands().size();
      if (childRels.size() < n) 
        return false;
      
      //一一按顺序对应match
      for (Pair<HepRelVertex, RelOptRuleOperand> pair
          : Pair.zip(childRels, operand.getChildOperands())) 
        boolean match =
            matchOperands(
                pair.right,
                pair.left.getCurrentRel(),
                bindings,
                nodeChildren);
        if (!match) 
          return false;
        
      
      return true;
    
  

逻辑是先比较自身,然后再递归比较children

比较函数,

可以看出,从类型,Trait,Predicate上来比较,是否匹配

 /**
   * Returns whether a relational expression matches this operand. It must be
   * of the right class and trait.
   */
  public boolean matches(RelNode rel) 
    if (!clazz.isInstance(rel)) 
      return false;
    
    if ((trait != null) && !rel.getTraitSet().contains(trait)) 
      return false;
    
    return predicate.test(rel);
  

child的类型分为,

/**
 * Policy by which operands will be matched by relational expressions with
 * any number of children.
 */
public enum RelOptRuleOperandChildPolicy 
  /**
   * Signifies that operand can have any number of children.
   */
  ANY,

  /**
   * Signifies that operand has no children. Therefore it matches a
   * leaf node, such as a table scan or VALUES operator.
   *
   * <p>@code RelOptRuleOperand(Foo.class, NONE) is equivalent to
   * @code RelOptRuleOperand(Foo.class) but we prefer the former because
   * it is more explicit.</p>
   */
  LEAF,

  /**
   * Signifies that the operand‘s children must precisely match its
   * child operands, in order.
   */
  SOME,

  /**
   * Signifies that the rule matches any one of its parents‘ children.
   * The parent may have one or more children.
   */
  UNORDERED,

不同的类型按照注释中的规则进行match

2. 如果match,继续会把当前结果封装成HepRuleCall,继承自RelOptRuleCall

是RelOptRule的一次invocation,即会记录调用中用到的上下文数据和结果

/**
 * A <code>RelOptRuleCall</code> is an invocation of a @link RelOptRule with a
 * set of @link RelNode relational expressions as arguments.
 */
public abstract class RelOptRuleCall 

  /**
   * Generator for @link #id values.
   */
  private static int nextId = 0;

  //~ Instance fields --------------------------------------------------------

  public final int id;
  protected final RelOptRuleOperand operand0; //Rule的Operand树的root
  protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs
  public final RelOptRule rule;
  public final RelNode[] rels; //match到的所有RelNodes
  private final RelOptPlanner planner;
  private final List<RelNode> parents; //对于Top RelNodes而言的parents

 

3. 调用rule.matches(call)

默认实现是,return true,即不检查,这里允许Rule加一下特定的match检测

 

4. 调用fireRule(call)

而其中主要是调用,ruleCall.getRule().onMatch(ruleCall);

所以做的事情是需要在Rule的onMatch里面定义的

 

下面看下JoinAssociateRule的实现,

  public void onMatch(final RelOptRuleCall call) 
    final Join topJoin = call.rel(0);
    final Join bottomJoin = call.rel(1);
    final RelNode relA = bottomJoin.getLeft();
    final RelNode relB = bottomJoin.getRight();
    final RelSubset relC = call.rel(2);
    final RelOptCluster cluster = topJoin.getCluster();
    final RexBuilder rexBuilder = cluster.getRexBuilder();

    if (relC.getConvention() != relA.getConvention()) 
      // relC could have any trait-set. But if we‘re matching say
      // EnumerableConvention, we‘re only interested in enumerable subsets.
      return;
    

    //        topJoin
    //        /         //   bottomJoin  C
    //    /        //   A      B

    final int aCount = relA.getRowType().getFieldCount();
    final int bCount = relB.getRowType().getFieldCount();
    final int cCount = relC.getRowType().getFieldCount();
    final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
    final ImmutableBitSet bBitSet =
        ImmutableBitSet.range(aCount, aCount + bCount);

这部分都是在初始化,上面好理解,下面算这些count是干啥?

其实是在算RexInputRef,

RexNode和RelNode的共同点是,他们都代表一个树

差别是他们代表的东西不同,RelNode代表关系代数算子组成的数,而RexNode表示表达式树

比如RexLiteral表示常量,RexVariable表示变量,RexCall表示操作来连接Literal和Variable

RexVariable往往表示输入的某个field,为了效率,这里只会记录field的id,即这个RexInputRef

/**
 * Variable which references a field of an input relational expression.
 *
 * <p>Fields of the input are 0-based. If there is more than one input, they are
 * numbered consecutively. For example, if the inputs to a join are</p>
 *
 * <ul>
 * <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>
 * <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>
 * </ul>
 *
 * <p>then the fields are:</p>
 *
 * <ul>
 * <li>Field #0: EMPNO</li>
 * <li>Field #1: ENAME</li>
 * <li>Field #2: DEPTNO (from EMP)</li>
 * <li>Field #3: DEPTNO2 (from DEPT)</li>
 * <li>Field #4: DNAME</li>
 * </ul>
 *
 * <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the
 * field DEPTNO2.</p>
 */
public class RexInputRef extends RexSlot 

而RexInputRef的index不是固定的,是按顺序排的,

所以按上面把fieldcount都算出来,就可以排出对应的index 

 

接着做的一堆都是为了调整conditions的位置和相对应的index

    // Goal is to transform to
    //
    //       newTopJoin
    //        /         //       A   newBottomJoin
    //               /        //              B      C

    // Split the condition of topJoin and bottomJoin into a conjunctions. A
    // condition can be pushed down if it does not use columns from A.
    final List<RexNode> top = new ArrayList<>();
    final List<RexNode> bottom = new ArrayList<>();
    JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
    JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
        bottom);

    // Mapping for moving conditions from topJoin or bottomJoin to
    // newBottomJoin.
    // target: | B | C      |
    // source: | A       | B | C      |
    final Mappings.TargetMapping bottomMapping =
        Mappings.createShiftMapping(
            aCount + bCount + cCount,
            0, aCount, bCount,
            bCount, aCount + bCount, cCount);
    final List<RexNode> newBottomList = new ArrayList<>();
    new RexPermuteInputsShuttle(bottomMapping, relB, relC)
        .visitList(bottom, newBottomList);
    RexNode newBottomCondition =
        RexUtil.composeConjunction(rexBuilder, newBottomList);

1. 把原先TopJoin和BottomJoin里面的conditions,分成和A相关的和无关的

因为调整完以后,和A相关的conditions要放到TopJoin,而和A无关的需要放到BottomJoin

aBitSet里面记录了所有A的fields的index,

  /**
   * Splits a condition into conjunctions that do or do not intersect with
   * a given bit set.
   */
  static void split(
      RexNode condition,
      ImmutableBitSet bitSet,
      List<RexNode> intersecting,
      List<RexNode> nonIntersecting) 
    for (RexNode node : RelOptUtil.conjunctions(condition)) //把conjunction的条件拆分开
      ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields
      if (bitSet.intersects(inputBitSet)) 
        intersecting.add(node);
       else 
        nonIntersecting.add(node);
      
    
  

如何找出condition所用到的fields?

如下,在递归的过程做回把所有碰到的inputRef记录下来

    /**
     * Returns a bit set describing the inputs used by an expression.
     */
    public static ImmutableBitSet bits(RexNode node) 
      return analyze(node).inputBitSet.build();
    

  /** Returns an input finder that has analyzed a given expression. */
    public static InputFinder analyze(RexNode node) 
      final InputFinder inputFinder = new InputFinder();
      node.accept(inputFinder); //Visitor模式
      return inputFinder;
    

   //如果RexNode是InputRef,就记录下该Ref的index
    public Void visitInputRef(RexInputRef inputRef) 
      inputBitSet.set(inputRef.getIndex());
      return null;
    
  
  //如果RexNode是Call,继续递归accept该Call的相关Operands
  public R visitCall(RexCall call) 
    if (!deep) 
      return null;
    

    R r = null;
    for (RexNode operand : call.operands) 
      r = operand.accept(this);
    
    return r;
  

 

2. 由于树结构变了,会导致整个field的index发生改变

对于原先的,bottomJoin,A的field从0开始,B是从aCount开始
而对于变换后的,BottomJoin,B的field从0开始,C是从bCount开始

所以需要完成Ref的修正,

createShiftMapping,生成的mapping,包含原先的index,和当前新的index

第一个参数是index的范围,一共aCount + bCount + cCount,所以index需要小于这个值

然后,只有B,C的index需要调整,A的index不需要调整

其中,B的原先是从aCount开始,当前从0开始,一共bCount个fields;C的原先是从aCount + bCount开始,当前从bCount开始,一共cCount个fields

接着,RexPermuteInputsShuttle做具体的更新

/**
 * Shuttle which applies a permutation to its input fields.
 *
 * @see RexPermutationShuttle
 * @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)
 */
public class RexPermuteInputsShuttle extends RexShuttle 
  //~ Instance fields --------------------------------------------------------

  private final Mappings.TargetMapping mapping;
  private final ImmutableList<RelDataTypeField> fields;

  //对于每个InputRef,根据Mapping更新index
  @Override public RexNode visitInputRef(RexInputRef local) 
    final int index = local.getIndex();
    int target = mapping.getTarget(index);
    return new RexInputRef(
        target,
        local.getType());
  

 

最后,构建新的join,

    RexNode newBottomCondition =
        RexUtil.composeConjunction(rexBuilder, newBottomList);

    final Join newBottomJoin =
        bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
            relC, JoinRelType.INNER, false);

    // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
    // Field ordinals do not need to be changed.
    RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
    @SuppressWarnings("SuspiciousNameCombination")
    final Join newTopJoin =
        topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
            newBottomJoin, JoinRelType.INNER, false);

    call.transformTo(newTopJoin);

 

以上是关于Calcite分析 - Rule的主要内容,如果未能解决你的问题,请参考以下文章

95-910-330-源码-FlinkSQL-Calcite-Flink结合Calcite

看这篇就够了丨基于Calcite框架的SQL语法扩展探索

[源码分析] 带你梳理 Flink SQL / Table API内部执行流程

Calcite - StreamingSQL

Apache Calcite 学习

calcite自定义数据类型