public class ChiSquareTest
extends Object
Chi-square hypothesis testing for categorical data.
See Wikipedia for more information on the Chi-squared test.
Constructor and Description |
---|
ChiSquareTest() |
Modifier and Type | Method and Description |
---|---|
static Dataset<Row> |
test(Dataset<Row> dataset,
String featuresCol,
String labelCol)
Conduct Pearson's independence test for every feature against the label.
|
public static Dataset<Row> test(Dataset<Row> dataset, String featuresCol, String labelCol)
The null hypothesis is that the occurrence of the outcomes is statistically independent.
dataset
- DataFrame of categorical labels and categorical features.
Real-valued features will be treated as categorical for each distinct value.featuresCol
- Name of features column in dataset, of type Vector
(VectorUDT
)labelCol
- Name of label column in dataset, of any numerical typepValues: Vector
- degreesOfFreedom: Array[Int]
- statistics: Vector
Each of these fields has one value per feature.