DataFrame.
to_orc
Write a DataFrame to the ORC format.
Path to write to.
Python write mode, default ‘w’.
Note
mode can accept the strings for Spark writing mode. Such as ‘append’, ‘overwrite’, ‘ignore’, ‘error’, ‘errorifexists’.
‘append’ (equivalent to ‘a’): Append the new data to existing data.
‘overwrite’ (equivalent to ‘w’): Overwrite existing data.
‘ignore’: Silently ignore this operation if data already exists.
‘error’ or ‘errorifexists’: Throw an exception if data already exists.
Names of partitioning columns
Column names to be used in Spark to represent pandas-on-Spark’s index. The index name in pandas-on-Spark is ignored. By default the index is always lost.
All other options passed directly into Spark’s data source.
See also
read_orc
DataFrame.to_delta
DataFrame.to_parquet
DataFrame.to_table
DataFrame.to_spark_io
Notes
pandas API on Spark writes ORC files into the directory, path, and writes multiple part files in the directory unlike pandas. pandas API on Spark respects HDFS’s property such as ‘fs.default.name’.
Examples
>>> df = ps.DataFrame(dict( ... date=list(pd.date_range('2012-1-1 12:00:00', periods=3, freq='M')), ... country=['KR', 'US', 'JP'], ... code=[1, 2 ,3]), columns=['date', 'country', 'code']) >>> df date country code 0 2012-01-31 12:00:00 KR 1 1 2012-02-29 12:00:00 US 2 2 2012-03-31 12:00:00 JP 3
>>> df.to_orc('%s/to_orc/foo.orc' % path, partition_cols='date')
>>> df.to_orc( ... '%s/to_orc/foo.orc' % path, ... mode = 'overwrite', ... partition_cols=['date', 'country'])