pyspark.pandas.read_orc

pyspark.pandas.read_orc(path: str, columns: Optional[List[str]] = None, index_col: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]

Load an ORC object from the file path, returning a DataFrame.

Parameters
pathstr

The path string storing the ORC file to be read.

columnslist, default None

If not None, only these columns will be read from the file.

index_colstr or list of str, optional, default: None

Index column of table in Spark.

optionsdict

All other options passed directly into Spark’s data source.

Returns
DataFrame

Examples

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path)
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'])
   id
0   0

You can preserve the index in the roundtrip as below.

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path, index_col="index")
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'], index_col="index")
... 
       id
index
0       0