corr {SparkR} | R Documentation |
Computes the Pearson Correlation Coefficient for two Columns.
Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.
corr(x, ...) ## S4 method for signature 'Column' corr(x, col2) ## S4 method for signature 'SparkDataFrame' corr(x, colName1, colName2, method = "pearson")
x |
a Column or a SparkDataFrame. |
... |
additional argument(s). If |
col2 |
a (second) Column. |
colName1 |
the name of the first column |
colName2 |
the name of the second column |
method |
Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now. |
The Pearson Correlation Coefficient as a Double.
corr since 1.6.0
corr since 1.6.0
Other aggregate functions: avg
,
column_aggregate_functions
,
count
, cov
,
first
, last
Other stat functions: approxQuantile
,
cov
, crosstab
,
freqItems
, sampleBy
## Not run:
##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
##D head(select(df, corr(df$mpg, df$hp)))
## End(Not run)
## Not run:
##D corr(df, "mpg", "hp")
##D corr(df, "mpg", "hp", method = "pearson")
## End(Not run)