public class OneHotEncoder extends Estimator<OneHotEncoderModel> implements OneHotEncoderBase, DefaultParamsWritable
[0.0, 0.0, 1.0, 0.0]
.
The last category is not included by default (configurable via dropLast
),
because it makes the vector entries sum up to one, and hence linearly dependent.
So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0]
.
StringIndexer
for converting categorical values into category indices,
Serialized Form
When handleInvalid
is configured to 'keep', an extra "category" indicating invalid values is
added as last category. So when dropLast
is true, invalid values are encoded as all-zeros
vector.
, When encoding multi-column by using inputCols
and outputCols
params, input/output cols
come in pairs, specified by the order in the arrays, and each pair is treated independently.
Constructor and Description |
---|
OneHotEncoder() |
OneHotEncoder(String uid) |
Modifier and Type | Method and Description |
---|---|
OneHotEncoder |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
BooleanParam |
dropLast()
Whether to drop the last category in the encoded vector (default: true)
|
OneHotEncoderModel |
fit(Dataset<?> dataset)
Fits a model to the input data.
|
Param<String> |
handleInvalid()
Param for how to handle invalid data during transform().
|
Param<String> |
inputCol()
Param for input column name.
|
StringArrayParam |
inputCols()
Param for input column names.
|
static OneHotEncoder |
load(String path) |
Param<String> |
outputCol()
Param for output column name.
|
StringArrayParam |
outputCols()
Param for output column names.
|
static MLReader<T> |
read() |
OneHotEncoder |
setDropLast(boolean value) |
OneHotEncoder |
setHandleInvalid(String value) |
OneHotEncoder |
setInputCol(String value) |
OneHotEncoder |
setInputCols(String[] values) |
OneHotEncoder |
setOutputCol(String value) |
OneHotEncoder |
setOutputCols(String[] values) |
StructType |
transformSchema(StructType schema)
Check transform validity and derive the output schema from the input schema.
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
params
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getDropLast, getInOutCols, validateAndTransformSchema
getHandleInvalid
getInputCol
getInputCols
getOutputCol
getOutputCols
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString
write
save
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public OneHotEncoder(String uid)
public OneHotEncoder()
public static OneHotEncoder load(String path)
public static MLReader<T> read()
public Param<String> handleInvalid()
OneHotEncoderBase
handleInvalid
in interface OneHotEncoderBase
handleInvalid
in interface HasHandleInvalid
public final BooleanParam dropLast()
OneHotEncoderBase
dropLast
in interface OneHotEncoderBase
public final StringArrayParam outputCols()
HasOutputCols
outputCols
in interface HasOutputCols
public final Param<String> outputCol()
HasOutputCol
outputCol
in interface HasOutputCol
public final StringArrayParam inputCols()
HasInputCols
inputCols
in interface HasInputCols
public final Param<String> inputCol()
HasInputCol
inputCol
in interface HasInputCol
public String uid()
Identifiable
uid
in interface Identifiable
public OneHotEncoder setInputCol(String value)
public OneHotEncoder setOutputCol(String value)
public OneHotEncoder setInputCols(String[] values)
public OneHotEncoder setOutputCols(String[] values)
public OneHotEncoder setDropLast(boolean value)
public OneHotEncoder setHandleInvalid(String value)
public StructType transformSchema(StructType schema)
PipelineStage
We check validity for interactions between parameters during transformSchema
and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema
in class PipelineStage
schema
- (undocumented)public OneHotEncoderModel fit(Dataset<?> dataset)
Estimator
fit
in class Estimator<OneHotEncoderModel>
dataset
- (undocumented)public OneHotEncoder copy(ParamMap extra)
Params
defaultCopy()
.copy
in interface Params
copy
in class Estimator<OneHotEncoderModel>
extra
- (undocumented)