R: corr

corr {SparkR}

R Documentation

corr

Description

Computes the Pearson Correlation Coefficient for two Columns.

Calculates the correlation of two columns of a SparkDataFrame. Currently only supports the Pearson Correlation Coefficient. For Spearman Correlation, consider using RDD methods found in MLlib's Statistics.

Usage

corr(x, ...)

## S4 method for signature 'Column'
corr(x, col2)

## S4 method for signature 'SparkDataFrame'
corr(x, colName1, colName2, method = "pearson")

Arguments

`x`	a Column or a SparkDataFrame.
`...`	additional argument(s). If `x` is a Column, a Column should be provided. If `x` is a SparkDataFrame, two column names should be provided.
`col2`	a (second) Column.
`colName1`	the name of the first column
`colName2`	the name of the second column
`method`	Optional. A character specifying the method for calculating the correlation. only "pearson" is allowed now.

Value

The Pearson Correlation Coefficient as a Double.

Note

corr since 1.6.0

Examples

## Not run: 
##D df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
##D head(select(df, corr(df$mpg, df$hp)))
## End(Not run)

## Not run: 
##D corr(df, "mpg", "hp")
##D corr(df, "mpg", "hp", method = "pearson")
## End(Not run)

[Package SparkR version 2.3.1 Index]