pyspark.sql.DataFrame.pandas_api

DataFrame.pandas_api(index_col: Union[str, List[str], None] = None) → PandasOnSparkDataFrame[source]

Converts the existing DataFrame into a pandas-on-Spark DataFrame.

If a pandas-on-Spark DataFrame is converted to a Spark DataFrame and then back to pandas-on-Spark, it will lose the index information and the original index will be turned into a normal column.

This is only available if Pandas is installed and available.

Parameters
index_col: str or list of str, optional, default: None

Index column of table in Spark.

See also

pyspark.pandas.frame.DataFrame.to_spark

Examples

>>> df.show()  
+----+----+
|Col1|Col2|
+----+----+
|   a|   1|
|   b|   2|
|   c|   3|
+----+----+
>>> df.pandas_api()  
  Col1  Col2
0    a     1
1    b     2
2    c     3

We can specify the index columns.

>>> df.pandas_api(index_col="Col1"): 
      Col2
Col1
a        1
b        2
c        3