如何使用subprocess.run（）来运行Hive查询？

Question

所以我试图使用subprocess模块执行一个配置单元查询，并将输出保存到文件data.txt以及日志（到log.txt），但我似乎遇到了一些麻烦。我看看this gist和this SO question，但似乎都没有给我我需要的东西。

这是我正在运行的：

import subprocess
query = "select user, sum(revenue) as revenue from my_table where user = 'dave' group by user;"
outfile = "data.txt"
logfile = "log.txt"

log_buff = open("log.txt", "a")
data_buff = open("data.txt", "w")

# note - "hive -e [query]" would normally just print all the results 
# to the console after finishing
proc = subprocess.run(["hive" , "-e" '"{}"'.format(query)],
                    stdin=subprocess.PIPE,
                    stdout=data_buff,
                    stderr=log_buff,
                    shell=True)

log_buff.close()
data_buff.close()

我也看了this SO question regarding subprocess.run() vs subprocess.Popen，我相信我想要.run()，因为我想要阻止这个过程直到完成。

最终输出应该是带有查询的制表符分隔结果的文件data.txt，以及带有hive作业生成的所有日志记录的log.txt。任何帮助都会很精彩。

更新：

通过上述方式，我目前得到以下输出：

log.txt的

[ralston@tpsci-gw01-vm tmp]$ cat log.txt
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/y/share/hadoop-2.8.3.0.1802131730/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/y/libexec/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Logging initialized using configuration in file:/home/y/libexec/hive/conf/hive-log4j.properties

data.txt中

[ralston@tpsci-gw01-vm tmp]$ cat data.txt
hive> [ralston@tpsci-gw01-vm tmp]$

我可以验证java / hive进程是否运行：

[ralston@tpsci-gw01-vm tmp]$ ps -u ralston
  PID TTY          TIME CMD
14096 pts/0    00:00:00 hive
14141 pts/0    00:00:07 java
14259 pts/0    00:00:00 ps
16275 ?        00:00:00 sshd
16276 pts/0    00:00:00 bash

但它似乎没有完成，也没有记录我想要的一切。