pyspark.ml.
Pipeline
A simple pipeline, which acts as an estimator. A Pipeline consists of a sequence of stages, each of which is either an Estimator or a Transformer. When Pipeline.fit() is called, the stages are executed in order. If a stage is an Estimator, its Estimator.fit() method will be called on the input dataset to fit a model. Then the model, which is a transformer, will be used to transform the dataset as the input to the next stage. If a stage is a Transformer, its Transformer.transform() method will be called to produce the dataset for the next stage. The fitted model from a Pipeline is a PipelineModel, which consists of fitted models and transformers, corresponding to the pipeline stages. If stages is an empty list, the pipeline acts as an identity transformer.
Estimator
Transformer
Pipeline.fit()
Estimator.fit()
Transformer.transform()
PipelineModel
New in version 1.3.0.
Methods
clear(param)
clear
Clears a param from the param map if it has been explicitly set.
copy([extra])
copy
Creates a copy of this instance.
explainParam(param)
explainParam
Explains a single param and returns its name, doc, and optional default value and user-supplied value in a string.
explainParams()
explainParams
Returns the documentation of all params with their optionally default values and user-supplied values.
extractParamMap([extra])
extractParamMap
Extracts the embedded default param values and user-supplied values, and then merges them with extra values from input into a flat param map, where the latter value is used if there exist conflicts, i.e., with ordering: default param values < user-supplied values < extra.
fit(dataset[, params])
fit
Fits a model to the input dataset with optional parameters.
fitMultiple(dataset, paramMaps)
fitMultiple
Fits a model to the input dataset for each param map in paramMaps.
getOrDefault(param)
getOrDefault
Gets the value of a param in the user-supplied param map or its default value.
getParam(paramName)
getParam
Gets a param by its name.
getStages()
getStages
Get pipeline stages.
hasDefault(param)
hasDefault
Checks whether a param has a default value.
hasParam(paramName)
hasParam
Tests whether this instance contains a param with a given (string) name.
isDefined(param)
isDefined
Checks whether a param is explicitly set by user or has a default value.
isSet(param)
isSet
Checks whether a param is explicitly set by user.
load(path)
load
Reads an ML instance from the input path, a shortcut of read().load(path).
read()
read
Returns an MLReader instance for this class.
save(path)
save
Save this ML instance to the given path, a shortcut of ‘write().save(path)’.
set(param, value)
set
Sets a parameter in the embedded param map.
setParams(self, \*[, stages])
setParams
Sets params for Pipeline.
setStages(value)
setStages
Set pipeline stages.
write()
write
Returns an MLWriter instance for this ML instance.
Attributes
params
Returns all params ordered by name.
stages
Methods Documentation
New in version 1.4.0.
extra parameters
new instance
extra param values
merged param map
pyspark.sql.DataFrame
input dataset.
an optional param map that overrides embedded params. If a list/tuple of param maps is given, this calls fit on each param map and returns a list of models.
fitted model(s)
New in version 2.3.0.
collections.abc.Sequence
A Sequence of param maps.
_FitMultipleIterator
A thread safe iterable which contains one model for each param map. Each call to next(modelIterator) will return (index, model) where model was fit using paramMaps[index]. index values may not be sequential.
Gets the value of a param in the user-supplied param map or its default value. Raises an error if neither is set.
New in version 2.0.0.
of pyspark.ml.Transformer or pyspark.ml.Estimator
pyspark.ml.Transformer
pyspark.ml.Estimator
the pipeline instance
Attributes Documentation
Returns all params ordered by name. The default implementation uses dir() to get all attributes of type Param.
dir()
Param