Databricks 检查点 java.io.FileNotFoundException:没有这样的文件或目录:
Posted
技术标签:
【中文标题】Databricks 检查点 java.io.FileNotFoundException:没有这样的文件或目录:【英文标题】:Databricks checkpoint java.io.FileNotFoundException: No such file or directory: 【发布时间】:2021-09-24 16:36:42 【问题描述】:我尝试执行这个 writeStream
def _write_stream(data_frame, checkpoint_path, write_stream_path):
data_frame.writeStream.format("delta") \
.option("checkpointLocation", checkpoint_path) \
.trigger(processingTime="1 second") \
.option("mergeSchema", "true") \
.outputMode("append") \
.table(write_stream_path)
但我收到此错误
在 org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:428) 在 org.apache.spark.util.ThreadUtils$.parallelMap(ThreadUtils.scala:399) 在 com.databricks.sql.streaming.state.RocksDBFileManager.loadImmutableFilesFromDbfs(RocksDBFileManager.scala:433) 在 com.databricks.sql.streaming.state.RocksDBFileManager.loadCheckpointFromDbfs(RocksDBFileManager.scala:202) 在 com.databricks.sql.rocksdb.CloudRocksDB.$anonfun$open$5(CloudRocksDB.scala:437) 在 scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 在 org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:627) 在 com.databricks.sql.rocksdb.CloudRocksDB.timeTakenMs(CloudRocksDB.scala:523) 在 com.databricks.sql.rocksdb.CloudRocksDB.$anonfun$open$2(CloudRocksDB.scala:435) 在 com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:369) 在 com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:457) 在 com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:477) 在 com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:240) 在 scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) 在 com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:235) 在 com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:232) 在 com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:20) 在 com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:279) 在 com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:271) 在 com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:20) 在 com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:452) 在 com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:378) 在 com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:20) 在 com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:369) 在 com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:341) 在 com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:20) 在 com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:57) 在 com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:125) 在 com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:70) 在 com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:57) 在 com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:86) 在 com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:402) 在 com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:381) 在 com.databricks.sql.rocksdb.CloudRocksDB.recordOperation(CloudRocksDB.scala:52) 在 com.databricks.sql.rocksdb.CloudRocksDB.recordRocksDBOperation(CloudRocksDB.scala:542) 在 com.databricks.sql.rocksdb.CloudRocksDB.$anonfun$open$1(CloudRocksDB.scala:427) 在 com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377) 在 com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363) 在 com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) 在 com.databricks.sql.rocksdb.CloudRocksDB.open(CloudRocksDB.scala:427) 在 com.databricks.sql.rocksdb.CloudRocksDB.(CloudRocksDB.scala:80) 在 com.databricks.sql.rocksdb.CloudRocksDB$.open(CloudRocksDB.scala:595) 在 com.databricks.sql.fileNotification.autoIngest.CloudFilesSource.(CloudFilesSource.scala:82) 在 com.databricks.sql.fileNotification.autoIngest.CloudFilesNotificationSource.(CloudFilesNotificationSource.scala:44) 在 com.databricks.sql.fileNotification.autoIngest.CloudFilesSourceProvider.createSource(CloudFilesSourceProvider.scala:172) 在 org.apache.spark.sql.execution.datasources.DataSource.createSource(DataSource.scala:326) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$1.$anonfun$applyOrElse$1(MicroBatchExecution.scala:100) 在 scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$1.applyOrElse(MicroBatchExecution.scala:97) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$1.applyOrElse(MicroBatchExecution.scala:95) 在 org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:484) 在 org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86) 在 org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:484) 在 org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) 在 org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:262) 在 org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:258) 在 org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 在 org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) 在 org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:460) 在 org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:428) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution.planQuery(MicroBatchExecution.scala:95) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution.logicalPlan$lzycompute(MicroBatchExecution.scala:165) 在 org.apache.spark.sql.execution.streaming.MicroBatchExecution.logicalPlan(MicroBatchExecution.scala:165) 在 org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:349) 在 scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 在 org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852) 在 org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:341) 在 org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:268) 引起:java.io.FileNotFoundException:没有这样的文件或目录: s3:///**/*/checkpoint/sources/0/rocksdb/SSTs/.sst 在 shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3254) 在 shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3137) 在 shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3076) 在 org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) 在 org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) 在 org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2034) 在 org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2003) 在 org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1979) 在 com.databricks.sql.streaming.state.RocksDBFileManager.$anonfun$loadImmutableFilesFromDbfs$6(RocksDBFileManager.scala:442) 在 com.databricks.sql.streaming.state.RocksDBFileManager.$anonfun$loadImmutableFilesFromDbfs$6$adapted(RocksDBFileManager.scala:433) 在 org.apache.spark.util.ThreadUtils$.$anonfun$parallelMap$2(ThreadUtils.scala:397) 在 scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) 在 scala.util.Success.$anonfun$map$1(Try.scala:255) 在 scala.util.Success.map(Try.scala:213) 在 scala.concurrent.Future.$anonfun$map$1(Future.scala:292) 在 scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) 在 scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) 在 scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) 在 org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:104) 在 scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 在 org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:68) 在 org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:54) 在 org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:101) 在 org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:104) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 在 java.lang.Thread.run(Thread.java:748)
【问题讨论】:
如果检查点位置在数据路径内并且您进行了清理,它将损坏它。将检查点位置放置在数据路径之外是一种很好的做法。 【参考方案1】:请检查 checkpoint_path 位置是否存在。错误日志清楚地告诉,路径不存在。
Caused by: java.io.FileNotFoundException: No such file or directory: s3:///**/*/checkpoint/sources/0/rocksdb/SSTs/.sst
【讨论】:
我查过了以上是关于Databricks 检查点 java.io.FileNotFoundException:没有这样的文件或目录:的主要内容,如果未能解决你的问题,请参考以下文章
Databricks 检查点 java.io.FileNotFoundException:没有这样的文件或目录:
检查是不是在 databricks 笔记本或 ont 上工作
pyspark databricks 代码对传入文件进行零字节检查