pyspark.pandas.
read_orc
Load an ORC object from the file path, returning a DataFrame.
The path string storing the ORC file to be read.
If not None, only these columns will be read from the file.
Index column of table in Spark.
All other options passed directly into Spark’s data source.
Examples
>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path) >>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id']) id 0 0
You can preserve the index in the roundtrip as below.
>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path, index_col="index") >>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'], index_col="index") ... id index 0 0