pyspark.sql.functions.
corr
Returns a new Column for the Pearson Correlation Coefficient for col1 and col2.
Column
col1
col2
New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
first column to calculate correlation.
second column to calculate correlation.
Pearson Correlation Coefficient of these two column values.
Examples
>>> a = range(20) >>> b = [2 * x for x in range(20)] >>> df = spark.createDataFrame(zip(a, b), ["a", "b"]) >>> df.agg(corr("a", "b").alias('c')).collect() [Row(c=1.0)]