使用python从hive中检索数据时出错
Posted
技术标签:
【中文标题】使用python从hive中检索数据时出错【英文标题】:Error retrieving data from hive using python 【发布时间】:2016-06-02 07:27:09 【问题描述】:我使用python连接hive并将数据检索到pandas中,但它给出了一个错误:
pyhive.exc.OperationalError: TExecuteStatementResp
我的代码:
# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
from string import Template
config =
'host': '127.0.0.1',
'database': 'default'
def get_conn(conf):
conn = hive.connect(**conf)
return conn
def execute_hql(hql, params = None):
conn = get_conn(config)
cursor = conn.cursor()
hql = Template(hql).substitute(params)
cursor.execute(hql)
df = as_pandas(cursor)
return df
test.py
# -*- coding: utf-8 -*-
from pyhive import hive
from impala.util import as_pandas
import DB.hive_engines
hql = """
SELECT
keywords,
count(keywords)
FROM
table
WHERE
eventname = 'xxx' AND
cdate >= '$start_date' AND
cdate <= '$end_date'
GROUP BY
keywords
"""
if __name__ == '__main__':
params = 'start_date': '2016-04-01', 'end_date': '2016-04-03'
df = DB.hive_engines.execute_hql(hql, params)
print df
异常信息:
pyhive.exc.OperationalError: TExecuteStatementResp(status=TStatus(errorCode=1, errorMessage='Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr. MapRedTask', sqlState='08S01', infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql。 exec.mr.MapRedTask:28:27', 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:326', 'org.apache.hive.service.cli.operation.SQLOperation: runQuery:SQLOperation.java:146', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:173', 'org.apache.hive.service.cli.operation.Operation:run: Operation.java:268'、'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:410'、'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl。 java:391', 'sun.reflect.GeneratedMethodAccesso r31:invoke::-1'、'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43'、'java.lang.reflect.Method:invoke:Method.java:606'、'org.apache.hive。 service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service .cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:415 ', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1671', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun .proxy.$Proxy27:executeStatement::-1', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:245', 'org.apache.hive.service.cli.thrift.ThriftCLIService: ExecuteStatement:ThriftCLIService.java:509', 'org.apache.hive.service.cli.thrift.TCLIService$Proce ssor$ExecuteStatement:getResult:TCLIService.java:1313', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298', 'org.apache.thrift.ProcessFunction:进程:ProcessFunction.java:39'、'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39'、'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56'、' org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', 'java.util.concurrent.ThreadPoolExecutor$Worker: run:ThreadPoolExecutor.java:615', 'java.lang.Thread:run:Thread.java:745'], statusCode=3), operationHandle=None)
谢谢!
【问题讨论】:
【参考方案1】:在this 讨论之后,我在创建连接时使用了有效的用户名,从而解决了问题。
为了这个答案的完整性,我从上述论坛复制粘贴建议的代码。请注意那里的有效用户名。
from pyhive import hive
conn = hive.Connection(host='<myhost>',
port='<myport>',
database='spin1',
username='<a valid user>') # IMPORTANT**
cursor = conn.cursor()
print cursor.fetchall()
在没有有效用户名的情况下,我遇到了问题中提到的相同异常。
【讨论】:
以上是关于使用python从hive中检索数据时出错的主要内容,如果未能解决你的问题,请参考以下文章
使用 sqoop 从 Oracle 获取数据到 hive 时出错