spark.fpGrowth {SparkR}R Documentation

FP-growth

Description

A parallel FP-growth algorithm to mine frequent itemsets. spark.fpGrowth fits a FP-growth model on a SparkDataFrame. Users can spark.freqItemsets to get frequent itemsets, spark.associationRules to get association rules, predict to make predictions on new data based on generated association rules, and write.ml/read.ml to save/load fitted models. For more details, see FP-growth.

Usage

spark.fpGrowth(data, ...)

spark.freqItemsets(object)

spark.associationRules(object)

## S4 method for signature 'SparkDataFrame'
spark.fpGrowth(
  data,
  minSupport = 0.3,
  minConfidence = 0.8,
  itemsCol = "items",
  numPartitions = NULL
)

## S4 method for signature 'FPGrowthModel'
spark.freqItemsets(object)

## S4 method for signature 'FPGrowthModel'
spark.associationRules(object)

## S4 method for signature 'FPGrowthModel'
predict(object, newData)

## S4 method for signature 'FPGrowthModel,character'
write.ml(object, path, overwrite = FALSE)

Arguments

data

A SparkDataFrame for training.

...

additional argument(s) passed to the method.

object

a fitted FPGrowth model.

minSupport

Minimal support level.

minConfidence

Minimal confidence level.

itemsCol

Features column name.

numPartitions

Number of partitions used for fitting.

newData

a SparkDataFrame for testing.

path

the directory where the model is saved.

overwrite

logical value indicating whether to overwrite if the output path already exists. Default is FALSE which means throw exception if the output path exists.

Value

spark.fpGrowth returns a fitted FPGrowth model.

A SparkDataFrame with frequent itemsets. The SparkDataFrame contains two columns: items (an array of the same type as the input column) and freq (frequency of the itemset).

A SparkDataFrame with association rules. The SparkDataFrame contains four columns: antecedent (an array of the same type as the input column), consequent (an array of the same type as the input column), condfidence (confidence for the rule) and lift (lift for the rule)

predict returns a SparkDataFrame containing predicted values.

Note

spark.fpGrowth since 2.2.0

spark.freqItemsets(FPGrowthModel) since 2.2.0

spark.associationRules(FPGrowthModel) since 2.2.0

predict(FPGrowthModel) since 2.2.0

write.ml(FPGrowthModel, character) since 2.2.0

See Also

read.ml

Examples

## Not run: 
##D raw_data <- read.df(
##D   "data/mllib/sample_fpgrowth.txt",
##D   source = "csv",
##D   schema = structType(structField("raw_items", "string")))
##D 
##D data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
##D model <- spark.fpGrowth(data)
##D 
##D # Show frequent itemsets
##D frequent_itemsets <- spark.freqItemsets(model)
##D showDF(frequent_itemsets)
##D 
##D # Show association rules
##D association_rules <- spark.associationRules(model)
##D showDF(association_rules)
##D 
##D # Predict on new data
##D new_itemsets <- data.frame(items = c("t", "t,s"))
##D new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
##D predict(model, new_data)
##D 
##D # Save and load model
##D path <- "/path/to/model"
##D write.ml(model, path)
##D read.ml(path)
##D 
##D # Optional arguments
##D baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
##D another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
##D                                 itemsCol = "baskets", numPartitions = 10)
## End(Not run)

[Package SparkR version 2.4.5 Index]