仅在“-Dorg.apache.sqoop.splitter.allow_text_splitter=true”属性作为参数传递的情况下才允许为文本索引列生成拆分

Posted

技术标签:

【中文标题】仅在“-Dorg.apache.sqoop.splitter.allow_text_splitter=true”属性作为参数传递的情况下才允许为文本索引列生成拆分【英文标题】:Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a param 【发布时间】:2019-02-22 02:55:34 【问题描述】:

我已经命令将 sql 从 sqlserver 导入到 hive,如下所示

sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'

但是结果

19/02/22 09:10:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:10:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
19/02/22 09:10:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
19/02/22 09:10:24 INFO manager.SqlManager: Using default fetchSize of 1000
19/02/22 09:10:24 INFO tool.CodeGenTool: Beginning code generation
19/02/22 09:10:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [daily_language] AS t WHERE 1=0
19/02/22 09:10:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.5.0-292/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/02/22 09:10:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ddab816638bd5e65108647177ab703b0/daily_language.jar
19/02/22 09:10:27 INFO mapreduce.ImportJobBase: Beginning import of daily_language
19/02/22 09:10:29 INFO client.RMProxy: Connecting to ResourceManager at mghdop01.dcdms/10.0.37.157:8050
19/02/22 09:10:29 INFO client.AHSProxy: Connecting to Application History server at mghdop01.dcdms/10.0.37.157:10200
19/02/22 09:10:31 INFO db.DBInputFormat: Using read commited transaction isolation
19/02/22 09:10:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([kdbahasa]), MAX([kdbahasa]) FROM [daily_language]
19/02/22 09:10:31 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/root/.staging/job_1547085556146_0680
19/02/22 09:10:31 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:204)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter
        at org.apache.sqoop.mapreduce.db.TextSplitter.split(TextSplitter.java:67)
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:201)
        ... 23 more

为什么会有

错误 tool.ImportTool:遇到 IOException 运行导入作业: java.io.IOException:为文本索引列生成拆分 仅在以下情况下允许 “-Dorg.apache.sqoop.splitter.allow_text_splitter=true”属性通过 作为参数

虽然我在上面的 sqoop 导入中没有给出 split-by。 首先,我该如何解决上面的情况?

然后我尝试在上面的 sqoop 导入中添加“-Dorg.apache.sqoop.splitter.allow_text_splitter=true”,但它在下面给了我另一个错误;

19/02/22 09:20:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.5.0-292
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: Dorg.apache.sqoop.splitter.allow_text_splitter=true
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --username
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: pbddms
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: -P
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-import
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-database
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: test_hive
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: daily_language
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-overwrite
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-drop-import-delims
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: --null-non-string
19/02/22 09:20:43 ERROR tool.BaseSqoopTool: Unrecognized argument: \\N\

第二种情况,上面的情况如何解决?

【问题讨论】:

【参考方案1】:

在我的例子中,我必须把这部分放在双引号中:

sqoop 导入“-Dorg.apache.sqoop.splitter.allow_text_splitter=true”

【讨论】:

【参考方案2】:

我将kdbahasa 列作为拆分列。添加-m 1参数指定映射器的数量。 1 - 意味着它将在单个映射器上运行而无需拆分:

sqoop import --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language --hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N'

如果您想拆分,还请阅读有关 split_column 的信息:https://***.com/a/37389134/2700344

【讨论】:

使用-m 1时,是否应该跟分列?增加 num map 任务(即 2 或更多)怎么样,因为从 sqoop.apache.org/docs/1.4.7/… 开始,它只是说将 m 定义为 1 对不起,我之前删除了评论(你收到专栏了吗?) 不,使用 -m 1 您不需要指定拆分列。不会拆分,将在单个映射器上查询所有数据集 但如果我不使用-m 1,我必须使用split-by并定义--columns? 是的。如果要并行获取,则指定 -m 和 --split-by 没有 -m 和没有指定的--split-by sqoop 尝试自动选择拆分列并确定数量分裂【参考方案3】:

遇到类似的问题,发现这是我们需要传递该 sqoop 属性的方式:

sqoop import -D org.apache.sqoop.splitter.allow_text_splitter=true --connect 'jdbc:sqlserver://10.0.2.11:1433;database=SP2010' --username pbddms -P --table daily_language -- hive-import -m 1 --hive-database test_hive --hive-table daily_language --hive-overwrite --hive-drop-import-delims --null-string '\N' --null-non-string '\没有'

【讨论】:

以上是关于仅在“-Dorg.apache.sqoop.splitter.allow_text_splitter=true”属性作为参数传递的情况下才允许为文本索引列生成拆分的主要内容,如果未能解决你的问题,请参考以下文章

CORS 仅在 Firefox 和 Safari 上且仅在 onClick() 上阻止内容

MPMoviePlayerController 仅在调用两次时播放。仅在 iOS 4 中出现

如何仅在 Angular 的某些页面上显示标题组件(现在它仅在重新加载时显示)

keycloak-connect nodejs / meteor - 仅在首次登录时拒绝访问且仅在 prod

StoreKit 返回无效的产品标识符 - 仅在真正的 App Store 上,仅在 iOS7 上

如何使用 jqueryui 仅在其右侧调整左侧 div 的大小,并且它是相邻的右侧 div,仅在其左侧?