Hive3.1.2的Beeline执行过程

Posted 虎鲸不是鱼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive3.1.2的Beeline执行过程相关的知识,希望对你有一定的参考价值。

Hive3.1.2的Beeline执行过程

前言

由于阿里云DataPhin中台不能识别非DataPhin创建的表,不得已,笔者使用sql Client的beeline方式,实现了导入普通Hive表数据到DataPhin的Hive表:

beline -u "jdbc:hive2://Hive的Host:10000/default;principal=hive/一串HOST@realm域" -e "
insert overwrite table db1.tb1
select
	col1
from
	db2.tb2
;
"

当然分区表也是支持的。由于经常报错,笔者尝试扒源码,尝试根据beeline的执行过程【beeline执行流程】,寻找优化方向,顺便试试能不能找到可调的参数。

Beeline的使用方法可以参照官网的Confluence:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Hive参数配置官网的Confluence也十分详细:https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-RestrictedListandWhitelist

CDP7的Hive on Tez参数配置官网的Confluence也十分详细:https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez

从官网就可以查到哪些情况调哪些参数,不必像肤浅的SQL Boy们那样到处求人。

源码查看

使用Apache的Hive3.1.2。IDE选用idea,Maven有无其实影响不大,毕竟不会去详细看Calcite解析AST。

入口

package org.apache.hive.beeline;

/**
 * A console SQL shell with command completion.
 * <p>
 * TODO:
 * <ul>
 * <li>User-friendly connection prompts</li>
 * <li>Page results</li>
 * <li>Handle binary data (blob fields)</li>
 * <li>Implement command aliases</li>
 * <li>Stored procedure execution</li>
 * <li>Binding parameters to prepared statements</li>
 * <li>Scripting language</li>
 * <li>XA transactions</li>
 * </ul>
 *
 */
@SuppressWarnings("static-access")
public class BeeLine implements Closeable 
  /**
   * Starts the program.
   */
  public static void main(String[] args) throws IOException 
    mainWithInputRedirection(args, null);
      

beeline的Model下就有Beeline类,直接Main方法启动,简单粗暴。main方法内部也只有一个方法:

  /**
   * Starts the program with redirected input. For redirected output,
   * setOutputStream() and setErrorStream can be used.
   * Exits with 0 on success, 1 on invalid arguments, and 2 on any other error
   *
   * @param args
   *          same as main()
   *
   * @param inputStream
   *          redirected input, or null to use standard input
   */
  public static void mainWithInputRedirection(String[] args, InputStream inputStream)
      throws IOException 
    BeeLine beeLine = new BeeLine();
    try 
      int status = beeLine.begin(args, inputStream);

      if (!Boolean.getBoolean(BeeLineOpts.PROPERTY_NAME_EXIT)) 
          System.exit(status);
      
     finally 
      beeLine.close();
    
  

和平时看到的成功返回码0,失败返回码1一致。从main方法的null入参可知,这货使用的是注释标注的标准输入。

初始化Beeline对象

public BeeLine() 
  this(true);

这个构造方法:

public BeeLine(boolean isBeeLine) 
  this.isBeeLine = isBeeLine;
  this.signalHandler = new SunSignalHandler(this);
  this.shutdownHook = new Runnable() 
    @Override
    public void run() 
      try 
        if (history != null) 
          history.setMaxSize(getOpts().getMaxHistoryRows());
          history.flush();
        
       catch (IOException e) 
        error(e);
       finally 
        close();
      
    
  ;

在Beeline的类里有这个私有对象:

import jline.console.history.FileHistory;

  private FileHistory history;
  // Indicates if this instance of beeline is running in compatibility mode, or beeline mode
  private boolean isBeeLine = true;

显然这个多线程任务是用来写历史记录或者Log日志之类的功能,不用过多关注。

正式开始

跳入begin方法:

/**
 * Start accepting input from stdin, and dispatch it
 * to the appropriate @link CommandHandler until the
 * global variable <code>exit</code> is true.
 */
public int begin(String[] args, InputStream inputStream) throws IOException 
  try 
    // load the options first, so we can override on the command line
    getOpts().load();
   catch (Exception e) 
    // nothing
  

  setupHistory();

  //add shutdown hook to cleanup the beeline for smooth exit
  addBeelineShutdownHook();

  //this method also initializes the consoleReader which is
  //needed by initArgs for certain execution paths
  ConsoleReader reader = initializeConsoleReader(inputStream);
  if (isBeeLine) 
    int code = initArgs(args);
    if (code != 0) 
      return code;
    
   else 
    int code = initArgsFromCliVars(args);
    if (code != 0 || exit) 
      return code;
    
    defaultConnect(false);
  

  if (getOpts().isHelpAsked()) 
    return 0;
  
  if (getOpts().getScriptFile() != null) 
    return executeFile(getOpts().getScriptFile());
  
  try 
    info(getApplicationTitle());
   catch (Exception e) 
    // ignore
  
  return execute(reader, false);

这个begin方法才正式开始执行。可以看到有获取配置、读取输入流、初始化参数、连接、执行、执行脚本文件之类的方法。

读取配置load

跳入BeelineOpts.java可以看到:

public void load() throws IOException 
  try (InputStream in = new FileInputStream(rcFile)) 
    load(in);
  

再跳:

public void load(InputStream fin) throws IOException 
  Properties p = new Properties();
  p.load(fin);
  loadProperties(p);

再跳:

public static final String PROPERTY_NAME_EXIT = PROPERTY_PREFIX + "system.exit";
public static final String PROPERTY_PREFIX = "beeline.";

public void loadProperties(Properties props) 
  for (Object element : props.keySet()) 
    String key = element.toString();
    if (key.equals(PROPERTY_NAME_EXIT)) 
      // fix for sf.net bug 879422
      continue;
    
    if (key.startsWith(PROPERTY_PREFIX)) 
      set(key.substring(PROPERTY_PREFIX.length()),
          props.getProperty(key));
    
  

这个方法其实就是判断如果key=“beeline.system.exit”就跳出本次循环,否则根据key去掉“beeline.”后的值作为新key,根据源key获取配置的值作为新value传入set方法:

public void set(String key, String value) 
  set(key, value, false);

再跳:

public boolean set(String key, String value, boolean quiet) 
  try 
    beeLine.getReflector().invoke(this, "set" + key, new Object[] value);
    return true;
   catch (Exception e) 
    if (!quiet) 
      beeLine.error(beeLine.loc("error-setting", new Object[] key, e));
    
    return false;
  

这里委托执行:

package org.apache.hive.beeline;

class Reflector 
  public Object invoke(Object on, String method, Object[] args)
      throws InvocationTargetException, IllegalAccessException,
      ClassNotFoundException 
    return invoke(on, method, Arrays.asList(args));
  
    
  public Object invoke(Object on, String method, List args)
      throws InvocationTargetException, IllegalAccessException,
      ClassNotFoundException 
    return invoke(on, on == null ? null : on.getClass(), method, args);
  
    
  public Object invoke(Object on, Class defClass,
      String method, List args)
      throws InvocationTargetException, IllegalAccessException,
      ClassNotFoundException 
    Class c = defClass != null ? defClass : on.getClass();
    List<Method> candidateMethods = new LinkedList<Method>();

    Method[] m = c.getMethods();
    for (int i = 0; i < m.length; i++) 
      if (m[i].getName().equalsIgnoreCase(method)) 
        candidateMethods.add(m[i]);
      
    

    if (candidateMethods.size() == 0) 
      throw new IllegalArgumentException(beeLine.loc("no-method",
          new Object[] method, c.getName()));
    

    for (Iterator<Method> i = candidateMethods.iterator(); i.hasNext();) 
      Method meth = i.next();
      Class[] ptypes = meth.getParameterTypes();
      if (!(ptypes.length == args.size())) 
        continue;
      

      Object[] converted = convert(args, ptypes);
      if (converted == null) 
        continue;
      

      if (!Modifier.isPublic(meth.getModifiers())) 
        continue;
      
      return meth.invoke(on, converted);
    
    return null;
      

这里会反射获取到所有方法名称为“set某个key”的类和方法并添加到List。之后就会跳入Method.java:

@CallerSensitive
public Object invoke(Object obj, Object... args)
    throws IllegalAccessException, IllegalArgumentException,
       InvocationTargetException

    if (!override) 
        if (!Reflection.quickCheckMemberAccess(clazz, modifiers)) 
            Class<?> caller = Reflection.getCallerClass();
            checkAccess(caller, clazz, obj, modifiers);
        
    
    MethodAccessor ma = methodAccessor;             // read volatile
    if (ma == null) 
        ma = acquireMethodAccessor();
    
    return ma.invoke(obj, args);

遍历吊起所有Beeline类的public的方法。

例如Beeline类本身会调用自己的部分方法:

Properties confProps = commandLine.getOptionProperties("hiveconf");
for (String propKey : confProps.stringPropertyNames()) 
  setHiveConfVar(propKey, confProps.getProperty(propKey));


getOpts().setScriptFile(commandLine.getOptionValue("f"));

if (commandLine.getOptionValues("i") != null) 
  getOpts().setInitFiles(commandLine.getOptionValues("i"));


dbName = commandLine.getOptionValue("database");
getOpts().setVerbose(Boolean.parseBoolean(commandLine.getOptionValue("verbose")));
getOpts().setSilent(Boolean.parseBoolean(commandLine.getOptionValue("silent")));

int code = 0;
if (cl.getOptionValues('e') != null) 
  commands = Arrays.asList(cl.getOptionValues('e'));
  opts.setAllowMultiLineCommand(false); //When using -e, command is always a single line


if (cl.hasOption("help")) 
  usage();
  getOpts().setHelpAsked(true);
  return true;


Properties hiveConfs = cl.getOptionProperties("hiveconf");
for (String key : hiveConfs.stringPropertyNames()) 
  setHiveConfVar(key, hiveConfs.getProperty(key));


driver = cl.getOptionValue("d");
auth = cl.getOptionValue("a");
user = cl.getOptionValue("n");
getOpts().setAuthType(auth);
if (cl.hasOption("w")) 
  pass = obtainPasswordFromFile(cl.getOptionValue("w"));
 else 
if (beelineParser.isPasswordOptionSet) 
  pass = cl.getOptionValue("p");
  

url = cl.getOptionValue("u");
if ((url == null) && cl.hasOption("reconnect"))
// If url was not specified with -u, but -r was present, use that.
url = getOpts().getLastConnectedUrl();

getOpts().setInitFiles(cl.getOptionValues("i"));
getOpts().setScriptFile(cl.getOptionValue("f"));

public void updateOptsForCli() 
  getOpts().updateBeeLineOptsFromConf();
  getOpts().setShowHeader(false);
  getOpts().setEscapeCRLF(false);
  getOpts().setOutputFormat("dsv");
  getOpts().setDelimiterForDSV(' ');
  getOpts().setNullEmptyString(true);


setupHistory();

查看get方法顺路看到了为神马-u是穿用户连接参数 ,-n是敲用户名,-p是敲密码。。。这些都是代码里直接写死的。不用狐疑。

启动历史setupHistory

  private void setupHistory() throws IOExc

以上是关于Hive3.1.2的Beeline执行过程的主要内容,如果未能解决你的问题,请参考以下文章

Hive3.1.2的Beeline执行过程

Hive3.1.2的HQL执行过程

Hive3.1.2的HQL执行过程

Hive3.1.2的HQL执行过程

beeline中所有Hadoop及Hive可调参数

beeline中所有Hadoop及Hive可调参数