abstract class RDD[T: ClassTag]( @transient private var _sc: SparkContext, @transient private var deps: Seq[Dependency[_]] ) extends Serializable with Logging
1.抽象类:RDD必然是有子类实现的,使用时直接使用其子类即可.
2.Serializable:网络传输需要序列化,序列化好坏直接影响框架的性能
3.Logging:日志
4.T:泛型,RDD支持各种数据类型
5.SparkContext:
6.@transient:
2.RDD的五大特性
Internally, each RDD is characterized by five main properties:
1.A list of partitions
2.A function for computing each split
3.A list of dependencies on other RDDs
4.Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
5.Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)