HDFS读取数据发生异常

Posted 豪放婉约派程序员

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HDFS读取数据发生异常相关的知识,希望对你有一定的参考价值。

当HDFS某个或者某几个datanode被关闭,并且这期间一直有数据在写入HDFS时,HDFS上某些block可能会发生HDFS租约问题,导致在一定时间期限内,其他应用程序(MR、spark、hive等)无法读取该block数据而抛出异常,异常如下:

17/07/28 14:13:40 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 40.0 (TID 2777, dcnode5): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-1594711030-10.29.180.177-1497607441986:blk_1073842908_103050; getBlockSize()=24352; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[10.241.104.148:50010,DS-2a2e3731-7889-4572-ac03-1645cb9681f5,DISK], DatanodeInfoWithStorage[10.28.142.158:50010,DS-40c6a66e-4f6f-4061-8a54-ac1a8874e3e1,DISK], DatanodeInfoWithStorage[10.28.142.143:50010,DS-41399e02-856b-4761-af41-c916986bd400,DISK], DatanodeInfoWithStorage[10.28.142.37:50010,DS-4bb951a2-6963-4f24-ac80-4df64e0b5d99,DISK]]}
    at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:427)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335)
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271)
    at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:263)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1565)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:309)
    at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:305)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:305)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:778)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

关于租约的详细情况,可以从此链接进行了解:http://www.cnblogs.com/cssdongl/p/6699919.html

 

此时这些block并不是已经损坏了,只是租约未释放,导致其他程序无法读写,我们可以将其租约恢复或者暴力的直接删除该文件;

恢复租约的方法比较麻烦,在这里我只介绍如何找到这些block并且将其删除:

找出HDFS指定目录下,有哪些block因为租约问题而无法读写:(注意:例子中给出的是HDFS根目录地址“/”,请根据实际情况替换)

hadoop fsck / -openforwrite | egrep -v \'^\\.+$\' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//"

删除这些block

hadoop fsck / -openforwrite | egrep -v \'^\\.+$\' | egrep "MISSING|OPENFORWRITE" | grep -o "/[^ ]*" | sed -e "s/:$//" | xargs -i hadoop fs -rmr {};

 

以上是关于HDFS读取数据发生异常的主要内容,如果未能解决你的问题,请参考以下文章

hdfs读数据流程

Java异常处理机制

System.AccessViolationException”类型的未经处理的异常在 System.Data.dll 中发生。其它信息:尝试读取或写入受保护的内存。这通常指示其它内存已损坏。(代码片

即使在 PL/SQL 中发生异常,也继续循环读取 excel 行

Hadoop - HDFS的数据流剖析

SparkSession.catalog.clearCache() 可以从 hdfs 中删除数据吗?