pyspark.RDD.saveAsSequenceFile¶

RDD.saveAsSequenceFile(path, compressionCodecClass=None)[source]¶

Output a Python RDD of key-value pairs (of form RDD[(K, V)]) to any Hadoop file system, using the “org.apache.hadoop.io.Writable” types that we convert from the RDD’s key and value types. The mechanism is as follows:

Pyrolite is used to convert pickled Python RDD into RDD of Java objects.

Keys and values of this Java RDD are converted to Writables and written out.

Parameters

pathstr: path to sequence file
compressionCodecClassstr, optional: fully qualified classname of the compression codec class i.e. “org.apache.hadoop.io.compress.GzipCodec” (None by default)

pyspark.RDD.saveAsPickleFile pyspark.RDD.saveAsTextFile