DataFrame.
corr
Compute pairwise correlation of columns, excluding NA/null values.
pearson : standard correlation coefficient
spearman : Spearman rank correlation
See also
Series.corr
Notes
There are behavior differences between pandas-on-Spark and pandas.
the method argument only accepts ‘pearson’, ‘spearman’
the data should not contain NaNs. pandas-on-Spark will return an error.
pandas-on-Spark doesn’t support the following argument(s).
min_periods argument is not supported
Examples
>>> df = ps.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)], ... columns=['dogs', 'cats']) >>> df.corr('pearson') dogs cats dogs 1.000000 -0.851064 cats -0.851064 1.000000
>>> df.corr('spearman') dogs cats dogs 1.000000 -0.948683 cats -0.948683 1.000000