SparkContext.
hadoopFile
Read an ‘old’ Hadoop InputFormat with arbitrary key and value class from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. The mechanism is the same as for SparkContext.sequenceFile().
SparkContext.sequenceFile()
A Hadoop configuration can be passed in as a Python dict. This will be converted into a Configuration in Java.
path to Hadoop file
fully qualified classname of Hadoop InputFormat (e.g. “org.apache.hadoop.mapreduce.lib.input.TextInputFormat”)
fully qualified classname of key Writable class (e.g. “org.apache.hadoop.io.Text”)
fully qualified classname of value Writable class (e.g. “org.apache.hadoop.io.LongWritable”)
fully qualified name of a function returning key WritableConverter (None by default)
fully qualified name of a function returning value WritableConverter (None by default)
Hadoop configuration, passed in as a dict (None by default)
The number of Python objects represented as a single Java object. (default 0, choose batchSize automatically)