spark.glm {SparkR} | R Documentation |
Fits generalized linear model against a Spark DataFrame.
Users can call summary
to print a summary of the fitted model, predict
to make
predictions on new data, and write.ml
/read.ml
to save/load fitted models.
spark.glm(data, formula, ...) ## S4 method for signature 'SparkDataFrame,formula' spark.glm(data, formula, family = gaussian, tol = 1e-06, maxIter = 25) ## S4 method for signature 'GeneralizedLinearRegressionModel' summary(object, ...) ## S3 method for class 'summary.GeneralizedLinearRegressionModel' print(x, ...) ## S4 method for signature 'GeneralizedLinearRegressionModel' predict(object, newData) ## S4 method for signature 'GeneralizedLinearRegressionModel,character' write.ml(object, path, overwrite = FALSE)
data |
SparkDataFrame for training. |
formula |
A symbolic description of the model to be fitted. Currently only a few formula operators are supported, including '~', '.', ':', '+', and '-'. |
family |
A description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Refer R family at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html. |
tol |
Positive convergence tolerance of iterations. |
maxIter |
Integer giving the maximal number of IRLS iterations. |
object |
A fitted generalized linear model |
x |
Summary object of fitted generalized linear model returned by |
newData |
SparkDataFrame for testing |
path |
The directory where the model is saved |
overwrite |
Overwrites or not if the output path already exists. Default is FALSE which means throw exception if the output path exists. |
spark.glm
returns a fitted generalized linear model
summary
returns a summary object of the fitted model, a list of components
including at least the coefficients, null/residual deviance, null/residual degrees
of freedom, AIC and number of iterations IRLS takes.
predict
returns a SparkDataFrame containing predicted labels in a column named
"prediction"
spark.glm since 2.0.0
summary(GeneralizedLinearRegressionModel) since 2.0.0
print.summary.GeneralizedLinearRegressionModel since 2.0.0
predict(GeneralizedLinearRegressionModel) since 1.5.0
write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0
## Not run:
##D sparkR.session()
##D data(iris)
##D df <- createDataFrame(iris)
##D model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
##D summary(model)
##D
##D # fitted values on training data
##D fitted <- predict(model, df)
##D head(select(fitted, "Sepal_Length", "prediction"))
##D
##D # save fitted model to input path
##D path <- "path/to/model"
##D write.ml(model, path)
##D
##D # can also read back the saved model and print
##D savedModel <- read.ml(path)
##D summary(savedModel)
## End(Not run)