写入 Bigtable 时出现问题

Posted

技术标签:

【中文标题】写入 Bigtable 时出现问题【英文标题】:Issue while writing to Bigtable 【发布时间】:2021-04-07 15:08:15 【问题描述】:

我们有一个 Spark 作业,它从 GCS 读取数据并将其加载到 BigTable 中。这项工作在较低的环境中运行良好,但我们在生产环境中遇到了奇怪的问题。我们正在使用 HBase bulk put api 将数据存储到 Bigtable 中。任何线索将不胜感激。

日志

2021-04-07T13:18:14Z    ERROR   "class":"org.apache.spark.network.server.TransportRequestHandler","message":"Error while invoking RpcHandler#receive() for one-way message."
2021-04-07T13:18:14Z    ERROR   "class":"org.apache.spark.network.server.TransportRequestHandler","message":"Error while invoking RpcHandler#receive() for one-way message."
2021-04-07T13:18:14.118721886Z  DEFAULT Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 175 in stage 8.0 failed 4 times, most recent failure: Lost task 175.3 in stage 8.0 (TID 1445, gcp-prd-gcs-bigtable-loader-a32x7upkbwpxe-w-27, executor 36): org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 125 actions: StatusRuntimeException: 125 times, servers with issues: bigtable.googleapis.com at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.getExceptions(BigtableBufferedMutator.java:187)  at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:141)   at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:132) at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1$$anonfun$apply$1$$anonfun$apply$2.apply(HBaseContext.scala:230)    at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1$$anonfun$apply$1$$anonfun$apply$2.apply(HBaseContext.scala:230)    at scala.collection.Iterator$class.foreach(Iterator.scala:891)  at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)   at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1$$anonfun$apply$1.apply(HBaseContext.scala:230) at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1$$anonfun$apply$1.apply(HBaseContext.scala:228) at org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:489) at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1.apply(HBaseContext.scala:225)  at org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkPut$1.apply(HBaseContext.scala:225)  at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)  at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)  at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)   at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)   at org.apache.spark.scheduler.Task.run(Task.scala:123)  at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.119097387Z  DEFAULT 
2021-04-07T13:18:14.119418925Z  DEFAULT Driver stacktrace:
2021-04-07T13:18:14.119736812Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1926)
2021-04-07T13:18:14.120068638Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1914)
2021-04-07T13:18:14.130625639Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1913)
2021-04-07T13:18:14.131131757Z  DEFAULT     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
2021-04-07T13:18:14.131569273Z  DEFAULT     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
2021-04-07T13:18:14.131950286Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1913)
2021-04-07T13:18:14.132402917Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
2021-04-07T13:18:14.132735888Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:948)
2021-04-07T13:18:14.133129775Z  DEFAULT     at scala.Option.foreach(Option.scala:257)
2021-04-07T13:18:14.13354306Z   DEFAULT     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:948)
2021-04-07T13:18:14.133872192Z  DEFAULT     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2147)
2021-04-07T13:18:14.134181766Z  DEFAULT     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2096)
2021-04-07T13:18:14.134503241Z  DEFAULT     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2085)
2021-04-07T13:18:14.134901523Z  DEFAULT     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
2021-04-07T13:18:14.135328164Z  DEFAULT     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759)
2021-04-07T13:18:14.135739829Z  DEFAULT     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
2021-04-07T13:18:14.136105777Z  DEFAULT     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
2021-04-07T13:18:14.136498026Z  DEFAULT     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
2021-04-07T13:18:14.136840822Z  DEFAULT     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
2021-04-07T13:18:14.144167827Z  DEFAULT     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:980)
2021-04-07T13:18:14.144680824Z  DEFAULT     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:978)
2021-04-07T13:18:14.155168463Z  DEFAULT     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
2021-04-07T13:18:14.155558017Z  DEFAULT     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
2021-04-07T13:18:14.155867426Z  DEFAULT     at org.apache.spark.rdd.RDD.withScope(RDD.scala:385)
2021-04-07T13:18:14.156168331Z  DEFAULT     at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:978)
2021-04-07T13:18:14.156605051Z  DEFAULT     at org.apache.hadoop.hbase.spark.HBaseContext.bulkPut(HBaseContext.scala:224)
2021-04-07T13:18:14.160636853Z  DEFAULT     at org.apache.hadoop.hbase.spark.JavaHBaseContext.bulkPut(JavaHBaseContext.scala:166)
2021-04-07T13:18:14.161087983Z  DEFAULT     at com.gcp.gcstobigtable.ReadWTxnP2FromGcsAndWriteToBigTable.doBulkPut(ReadWTxnP2FromGcsAndWriteToBigTable.java:154)
2021-04-07T13:18:14.161516459Z  DEFAULT     at com.gcp.gcstobigtable.ReadWTxnP2FromGcsAndWriteToBigTable.main(ReadWTxnP2FromGcsAndWriteToBigTable.java:62)
2021-04-07T13:18:14.161971898Z  DEFAULT     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2021-04-07T13:18:14.162391968Z  DEFAULT     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2021-04-07T13:18:14.162795326Z  DEFAULT     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2021-04-07T13:18:14.16313107Z   DEFAULT     at java.lang.reflect.Method.invoke(Method.java:498)
2021-04-07T13:18:14.191936436Z  DEFAULT     at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
2021-04-07T13:18:14.192445084Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
2021-04-07T13:18:14.192877426Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
2021-04-07T13:18:14.193322881Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
2021-04-07T13:18:14.193720635Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
2021-04-07T13:18:14.194132375Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
2021-04-07T13:18:14.194645147Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
2021-04-07T13:18:14.195099816Z  DEFAULT     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2021-04-07T13:18:14.432439159Z  DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.434223483Z  DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.436006312Z  DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.440979599Z  DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.44272817Z   DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)
2021-04-07T13:18:14.448014227Z  DEFAULT Could not find CoarseGrainedScheduler.  at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)  at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)   at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105) at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.lang.Thread.run(Thread.java:748)

代码:

Configuration conf = BigtableConfiguration.configure(projectId,bigTableInstanceId);
JavaHBaseContext hbaseContext = new JavaHBaseContext(sc, conf);
hbaseContext.bulkPut(bqDataRdd, bigTableName, bulkPutObject);

【问题讨论】:

嗨,这对您来说仍然是个问题吗?如果没有,您是如何做到的? 问题出在负载上,我们的一个负载在 4-5M 左右。因此,工作给出了这个问题。为了避免这个问题,我们迭代指定文件夹中的所有 parquet 文件并一个一个地加载它们,而不是将所有 40-50 个 parquet 文件一起加载,其中包含大约 4-5M 的数据 【参考方案1】:

问题出在负载上,我们的一个负载大约是 4-5M。因此,工作给出了这个问题。为了避免这个问题,我们迭代指定文件夹中的所有 parquet 文件并一个一个加载它们,而不是将所有 40-50 个 parquet 文件一起加载,大约 4-5M 数据

【讨论】:

以上是关于写入 Bigtable 时出现问题的主要内容,如果未能解决你的问题,请参考以下文章

尝试从 Jupyter Notebook 使用 Spark 访问 Google Cloud Bigtable 时出现区域错误

Bigtable 是针对每个操作还是批量将操作写入日志?

使用 HTTP POST 将 OpenTSDB 写入 Bigtable 不起作用(使用 Kubernetes(

使用 .NET 应用程序从外部主机将数据写入 GCP BigTable

我的 BigTable 架构会导致热点吗?

Golang 中有 Apache Beam + Cloud Bigtable 连接器吗?