如何使用 impyla 连接到 impala 或使用 pyhive 连接到 hive?

Posted

技术标签:

【中文标题】如何使用 impyla 连接到 impala 或使用 pyhive 连接到 hive?【英文标题】:How to connect to impala using impyla or to hive using pyhive? 【发布时间】:2019-09-16 13:49:36 【问题描述】:

我正在尝试通过以下代码使用 impyla 连接到 impala:

from impala.dbapi import connect
conn = connect(host='host_name.com', port=21050, user='usr', password='pass', use_ssl=True, auth_mechanism='LDAP')
cursor = conn.cursor()
cursor.execute('SHOW DATABASES')
cursor.fetchall()

根据文档,这个库需要版本 0.2.1 中的 thrift_sasl,但我无法安装它,因为它显示了这个错误

Collecting thrift_sasl==0.2.1
  Using cached https://files.pythonhosted.org/packages/80/36/16dfe92d32df63cc2b7b7be8d0e4a736617b7e52daaa7f83ae386a89d179/thrift_sasl-0.2.1.tar.gz
Collecting sasl>=0.2.1 (from thrift_sasl==0.2.1)
  Using cached https://files.pythonhosted.org/packages/8e/2c/45dae93d666aea8492678499e0999269b4e55f1829b1e4de5b8204706ad9/sasl-0.2.1.tar.gz
Collecting thriftpy (from thrift_sasl==0.2.1)
  Using cached https://files.pythonhosted.org/packages/f4/19/cca118cf7d2087310dbc8bd70dc7df0c1320f2652873a93d06d7ba356d4a/thriftpy-0.3.9.tar.gz
Requirement already satisfied: six in c:\users\psowa\appdata\local\programs\python\python36\lib\site-packages (from sasl>=0.2.1->thrift_sasl==0.2.1) (1.12.0)
Requirement already satisfied: ply<4.0,>=3.4 in c:\users\psowa\appdata\local\programs\python\python36\lib\site-packages (from thriftpy->thrift_sasl==0.2.1) (3.11)
Installing collected packages: sasl, thriftpy, thrift-sasl
  Running setup.py install for sasl ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\psowa\appdata\local\programs\python\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-y42wej4x\\sasl\\setup.py'"'"'; __file__='"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-y42wej4x\\sasl\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\psowa\AppData\Local\Temp\pip-record-rjwkcc76\install-record.txt' --single-version-externally-managed --compile
         cwd: C:\Users\psowa\AppData\Local\Temp\pip-install-y42wej4x\sasl\
    Complete output (27 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.6
    creating build\lib.win-amd64-3.6\sasl
    copying sasl\__init__.py -> build\lib.win-amd64-3.6\sasl
    running egg_info
    writing sasl.egg-info\PKG-INFO
    writing dependency_links to sasl.egg-info\dependency_links.txt
    writing requirements to sasl.egg-info\requires.txt
    writing top-level names to sasl.egg-info\top_level.txt
    reading manifest file 'sasl.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'sasl.egg-info\SOURCES.txt'
    copying sasl\saslwrapper.cpp -> build\lib.win-amd64-3.6\sasl
    copying sasl\saslwrapper.h -> build\lib.win-amd64-3.6\sasl
    copying sasl\saslwrapper.pyx -> build\lib.win-amd64-3.6\sasl
    running build_ext
    building 'sasl.saslwrapper' extension
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    creating build\temp.win-amd64-3.6\Release\sasl
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Isasl -Ic:\users\psowa\appdata\local\programs\python\python36\include -Ic:\users\psowa\appdata\local\programs\python\python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpsasl/saslwrapper.cpp /Fobuild\temp.win-amd64-3.6\Release\sasl/saslwrapper.obj
    saslwrapper.cpp
    c:\users\psowa\appdata\local\temp\pip-install-y42wej4x\sasl\sasl\saslwrapper.h(22): fatal error C1083: Cannot open include file: 'sasl/sasl.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\psowa\appdata\local\programs\python\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-y42wej4x\\sasl\\setup.py'"'"'; __file__='"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-y42wej4x\\sasl\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\psowa\AppData\Local\Temp\pip-record-rjwkcc76\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.

当我安装最新版本的 thrift_sasl jupyter 时出现此错误:

AttributeError: 'TSSLSocket' object has no attribute 'isOpen'

我也尝试使用以下代码通过 pyhive 进行连接:

from pyhive import hive

host_name = "host_name.com"
port = 10000
user = "usr"
password = "pass"

def hiveconnection(host_name, port, user,password):
    conn = hive.Connection(host=host_name, port=port, username=user, password=password, auth='LDAP')
    cur = conn.cursor()
    cur.execute('SHOW DATABASES')
    result = cur.fetchall()

    return result

output = hiveconnection(host_name, port, user,password)
print(output)

它希望我安装 sasl,但是当我尝试这样做时,它显示:

Collecting sasl
  Using cached https://files.pythonhosted.org/packages/8e/2c/45dae93d666aea8492678499e0999269b4e55f1829b1e4de5b8204706ad9/sasl-0.2.1.tar.gz
Requirement already satisfied: six in c:\users\psowa\appdata\local\programs\python\python36\lib\site-packages (from sasl) (1.12.0)
Installing collected packages: sasl
  Running setup.py install for sasl ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\psowa\appdata\local\programs\python\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-9rn_a9g0\\sasl\\setup.py'"'"'; __file__='"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-9rn_a9g0\\sasl\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\psowa\AppData\Local\Temp\pip-record-sedxucba\install-record.txt' --single-version-externally-managed --compile
         cwd: C:\Users\psowa\AppData\Local\Temp\pip-install-9rn_a9g0\sasl\
    Complete output (27 lines):
    running install
    running build
    running build_py
    creating build
    creating build\lib.win-amd64-3.6
    creating build\lib.win-amd64-3.6\sasl
    copying sasl\__init__.py -> build\lib.win-amd64-3.6\sasl
    running egg_info
    writing sasl.egg-info\PKG-INFO
    writing dependency_links to sasl.egg-info\dependency_links.txt
    writing requirements to sasl.egg-info\requires.txt
    writing top-level names to sasl.egg-info\top_level.txt
    reading manifest file 'sasl.egg-info\SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'sasl.egg-info\SOURCES.txt'
    copying sasl\saslwrapper.cpp -> build\lib.win-amd64-3.6\sasl
    copying sasl\saslwrapper.h -> build\lib.win-amd64-3.6\sasl
    copying sasl\saslwrapper.pyx -> build\lib.win-amd64-3.6\sasl
    running build_ext
    building 'sasl.saslwrapper' extension
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    creating build\temp.win-amd64-3.6\Release\sasl
    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Isasl -Ic:\users\psowa\appdata\local\programs\python\python36\include -Ic:\users\psowa\appdata\local\programs\python\python36\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /EHsc /Tpsasl/saslwrapper.cpp /Fobuild\temp.win-amd64-3.6\Release\sasl/saslwrapper.obj
    saslwrapper.cpp
    c:\users\psowa\appdata\local\temp\pip-install-9rn_a9g0\sasl\sasl\saslwrapper.h(22): fatal error C1083: Cannot open include file: 'sasl/sasl.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 2
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\psowa\appdata\local\programs\python\python36\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-9rn_a9g0\\sasl\\setup.py'"'"'; __file__='"'"'C:\\Users\\psowa\\AppData\\Local\\Temp\\pip-install-9rn_a9g0\\sasl\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\psowa\AppData\Local\Temp\pip-record-sedxucba\install-record.txt' --single-version-externally-managed --compile Check the logs for full command output.

有什么想法吗?

【问题讨论】:

【参考方案1】:

在 2.7 版中使用 python 修复了这个问题。我认为兼容性存在问题。

【讨论】:

以上是关于如何使用 impyla 连接到 impala 或使用 pyhive 连接到 hive?的主要内容,如果未能解决你的问题,请参考以下文章

Impyla 连接。无法启动 SASL。没有可用的机制

0039-如何使用Python Impyla客户端连接Hive和Impala

如何使用Python Impyla客户端连接Hive和Impala

如何在 Hadoop 上运行的 Cloudera Impala 的 python impyla 查询中转义字符

python连接impala(安装impyla)

使用边缘节点运行 Hadoop 集群时如何连接到 Impala