pyspark.pandas.read_orc¶

pyspark.pandas.read_orc(path: str, columns: Optional[List[str]] = None, index_col: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame[source]¶

Load an ORC object from the file path, returning a DataFrame.

Parameters

pathstr: The path string storing the ORC file to be read.
columnslist, default None: If not None, only these columns will be read from the file.
index_colstr or list of str, optional, default: None: Index column of table in Spark.
optionsdict: All other options passed directly into Spark’s data source.

Returns

DataFrame

Examples

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path)
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'])
   id
0   0

You can preserve the index in the roundtrip as below.

>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path, index_col="index")
>>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'], index_col="index")
... 
       id
index
0       0

pyspark.pandas.DataFrame.to_parquet pyspark.pandas.DataFrame.to_orc