pyspark.sql.DataFrame.cache¶
-
DataFrame.
cache
() → pyspark.sql.dataframe.DataFrame[source]¶ Persists the
DataFrame
with the default storage level (MEMORY_AND_DISK_DESER).New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
- Returns
DataFrame
Cached DataFrame.
Notes
The default storage level has changed to MEMORY_AND_DISK_DESER to match Scala in 3.0.
Examples
>>> df = spark.range(1) >>> df.cache() DataFrame[id: bigint]
>>> df.explain() == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- InMemoryTableScan ...