corr {SparkR} | R Documentation |
Computes the Pearson Correlation Coefficient for two Columns.
Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.
corr(x, ...) ## S4 method for signature 'Column' corr(x, col2) ## S4 method for signature 'SparkDataFrame' corr(x, colName1, colName2, method = "pearson")
x |
a Column or a SparkDataFrame. |
... |
additional argument(s). If |
col2 |
a (second) Column. |
colName1 |
the name of the first column |
colName2 |
the name of the second column |
method |
Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now. |
The Pearson Correlation Coefficient as a Double.
corr since 1.6.0
corr since 1.6.0
Other math_funcs: acos
, asin
,
atan2
, atan
,
bin
, bround
,
cbrt
, ceil
,
conv
, cosh
,
cos
, covar_pop
,
cov
, expm1
,
exp
, factorial
,
floor
, hex
,
hypot
, log10
,
log1p
, log2
,
log
, pmod
,
rint
, round
,
shiftLeft
,
shiftRightUnsigned
,
shiftRight
, signum
,
sinh
, sin
,
sqrt
, tanh
,
tan
, toDegrees
,
toRadians
, unhex
Other stat functions: approxQuantile
,
cov
, crosstab
,
freqItems
, sampleBy
## Not run: corr(df$c, df$d)
## Not run:
##D df <- read.json("/path/to/file.json")
##D corr <- corr(df, "title", "gender")
##D corr <- corr(df, "title", "gender", method = "pearson")
## End(Not run)