RDD.
checkpoint
Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with SparkContext.setCheckpointDir() and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.
SparkContext.setCheckpointDir()
New in version 0.7.0.
See also
RDD.isCheckpointed()
RDD.getCheckpointFile()
RDD.localCheckpoint()
SparkContext.getCheckpointDir()
Examples
>>> rdd = sc.range(5) >>> rdd.is_checkpointed False >>> rdd.getCheckpointFile() == None True
>>> rdd.checkpoint() >>> rdd.is_checkpointed True >>> rdd.getCheckpointFile() == None True
>>> rdd.count() 5 >>> rdd.is_checkpointed True >>> rdd.getCheckpointFile() == None False