Spart RDD

Posted 2020-09-09 ordi

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Spart RDD相关的知识，希望对你有一定的参考价值。

RDD: Resilient Distributed Dataset

1. Spark RDD is immutable

Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
various worker nodes for processing, and finally compiling the results to produce the final
result can be done safely without worrying about the underlying data getting changed.

2.Spark RDD is distributable

3.Spark RDD lives in memory

Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
where Spark is running out of memory or if the data size is growing beyond the capacity, is
it written to disk. Most of the processing on RDD happens in the memory, and that is the
reason why Spark is able to process the data at a lightning fast speed.

4.Spark RDD is strongly typed

Spark RDD can be created using any supported data types. These data types can be
Scala/Java supported intrinsic data types or custom created data types such as your own
classes. The biggest advantage coming out of this design decision is the freedom from
runtime errors. If it is going to break because of a data type issue, it will break during
compile time.

以上是关于Spart RDD的主要内容，如果未能解决你的问题，请参考以下文章