RidgeRegressionModel

class pyspark.mllib.regression.RidgeRegressionModel(weights: pyspark.mllib.linalg.Vector, intercept: float)[source]

A linear regression model derived from a least-squares fit with an l_2 penalty term.

New in version 0.9.0.

Examples

>>> from pyspark.mllib.linalg import SparseVector
>>> from pyspark.mllib.regression import LabeledPoint
>>> data = [
...     LabeledPoint(0.0, [0.0]),
...     LabeledPoint(1.0, [1.0]),
...     LabeledPoint(3.0, [2.0]),
...     LabeledPoint(2.0, [3.0])
... ]
>>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10,
...     initialWeights=np.array([1.0]))
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(np.array([1.0])) - 1) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> abs(lrm.predict(sc.parallelize([[1.0]])).collect()[0] - 1) < 0.5
True
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> lrm.save(sc, path)
>>> sameModel = RidgeRegressionModel.load(sc, path)
>>> abs(sameModel.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(sameModel.predict(np.array([1.0])) - 1) < 0.5
True
>>> abs(sameModel.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> from shutil import rmtree
>>> try:
...    rmtree(path)
... except BaseException:
...    pass
>>> data = [
...     LabeledPoint(0.0, SparseVector(1, {0: 0.0})),
...     LabeledPoint(1.0, SparseVector(1, {0: 1.0})),
...     LabeledPoint(3.0, SparseVector(1, {0: 2.0})),
...     LabeledPoint(2.0, SparseVector(1, {0: 3.0}))
... ]
>>> lrm = LinearRegressionWithSGD.train(sc.parallelize(data), iterations=10,
...     initialWeights=np.array([1.0]))
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True
>>> lrm = RidgeRegressionWithSGD.train(sc.parallelize(data), iterations=10, step=1.0,
...     regParam=0.01, miniBatchFraction=1.0, initialWeights=np.array([1.0]), intercept=True,
...     validateData=True)
>>> abs(lrm.predict(np.array([0.0])) - 0) < 0.5
True
>>> abs(lrm.predict(SparseVector(1, {0: 1.0})) - 1) < 0.5
True

Methods

load(sc, path)

Load a RidgeRegressionMode.

predict(x)

Predict the value of the dependent variable given a vector or an RDD of vectors containing values for the independent variables.

save(sc, path)

Save a RidgeRegressionMode.

Attributes

intercept

Intercept computed for this model.

weights

Weights computed for every feature.

Methods Documentation

classmethod load(sc: pyspark.context.SparkContext, path: str)pyspark.mllib.regression.RidgeRegressionModel[source]

Load a RidgeRegressionMode.

New in version 1.4.0.

predict(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[float, pyspark.rdd.RDD[float]]

Predict the value of the dependent variable given a vector or an RDD of vectors containing values for the independent variables.

New in version 0.9.0.

save(sc: pyspark.context.SparkContext, path: str) → None[source]

Save a RidgeRegressionMode.

New in version 1.4.0.

Attributes Documentation

intercept

Intercept computed for this model.

New in version 1.0.0.

weights

Weights computed for every feature.

New in version 1.0.0.