错误 - 使用 Apache Sqoop 和 Dataproc 从 SQL Server 导入 GCS
Posted
技术标签:
【中文标题】错误 - 使用 Apache Sqoop 和 Dataproc 从 SQL Server 导入 GCS【英文标题】:ERROR - Import from SQL Server to GCS using Apache Sqoop & Dataproc 【发布时间】:2021-09-22 12:29:10 【问题描述】:我正在尝试将数据从 SQL Server 导入到 Google Cloud Storage,稍后我会将其上传到 BigQuery。我通过 Google 的 Cloud Shell 完成所有这些工作。
我已经完成了下载 Sqoop 和 Sql server JDBC 文件并下载然后上传到特定谷歌云存储的初始步骤。我还创建了一个 Google Dataproc 集群来提交 Sqoop 作业,但是当我尝试使用提交代码时,它会抛出一些错误。
我正在关注这个过程 (https://medium.com/datamindedbe/import-sql-server-data-in-bigquery-d640441d5d56),就我而言,我试图先提取一个表。Code to submit a job through dataproc
我尝试了什么
我确实有 SQL 服务器 jdbc .jar (mssql-jdbc-8.2.1.jre8.jar) 文件 在云存储中与其他依赖文件
我还检查了我的 SQL Server 2014 中的 TCP/IP 连接
按照错误提示处于推荐状态
我用来向 DATAPROC 集群提交 SQOOP 作业的代码
CLUSTERNAME="sqoop-cluster"
BUCKET="gs://sqoop-bucket-20092021"
libs=`gsutil ls $BUCKET/jars | paste -sd, --`
JDBC_STR="jdbc:sqlserver://RUKSQLRS01:1433;databaseName=RUKDataWarehouse"
SQL_USER="RUKSQLDataWarehouse_Reporting"
SQL_PASS="gs://sqoop-bucket-20092021/creds/sqoop.password"
TABLE="LBD_Task"
SCHEMA="dbo"
gcloud dataproc jobs submit hadoop \
--region europe-west2 \
--cluster="$CLUSTERNAME"\
--jars=$libs \
--class=org.apache.sqoop.Sqoop \
-- \
import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
-Dmapreduce.job.user.classpath.first=true \
--connect "$JDBC_STR" \
--username "$SQL_USER" \
--password-file "$SQL_PASS" \
--table "$SCHEMA.$TABLE" \
--warehouse-dir "$BUCKET/output/$TABLE" \
--num-mappers 1 \
--as-avrodatafile
我遇到的错误
21/09/22 11:30:46 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
21/09/22 11:30:48 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
21/09/22 11:30:48 INFO manager.SqlManager: Using default fetchSize of 1000
21/09/22 11:30:48 INFO tool.CodeGenTool: Beginning code generation
21/09/22 11:31:02 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host RUKSQLRS01, port 1433 has failed. Error: "RUKSQLRS01. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall.".
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:227)
at com.microsoft.sqlserver.jdbc.SQLServerException.ConvertConnectExceptionToSQLServerException(SQLServerException.java:284)
at com.microsoft.sqlserver.jdbc.SocketFinder.findSocket(IOBuffer.java:2435)
at com.microsoft.sqlserver.jdbc.TDSChannel.open(IOBuffer.java:635)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2010)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:904)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
21/09/22 11:31:02 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
【问题讨论】:
您的 SQL 服务器在哪里?是否可以从 Dataproc 集群访问?错误TCP/IP connection to the host RUKSQLRS01, port 1433 has failed
表示主机名为RUKSQLRS01
,是不是同一个VPC网络的GCE VM?可以从主节点运行nslookup RUKSQLRS01
吗?
SQL Server 托管在 AWS 上
可以从 GCE 访问吗?那你怎么保证RUKSQLRS01
能解析到IP地址呢?
【参考方案1】:
这似乎是一个网络问题。您的 SQL 服务器在 GCP 之外,您正尝试通过主机名访问它。您需要使用外部 IP 并在 SQL Server 端设置防火墙规则以允许从 GCP 访问,或者在您的 GCP VPC 网络和 SQL Server 网络之间设置 *** 并通过内部 IP 访问 SQL Server。
【讨论】:
以上是关于错误 - 使用 Apache Sqoop 和 Dataproc 从 SQL Server 导入 GCS的主要内容,如果未能解决你的问题,请参考以下文章
错误: 找不到或无法加载主类 org.apache.sqoop.Sqoop
Sqoop 导入错误:org.apache.hadoop.security.AccessControlException:权限被粘性位拒绝
Apache Sqoop 启动配置错误:org.apache.hadoop.mapred.YarnClientProtocolProvider not a subtype