Series.
factorize
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values.
Value to mark “not found”. If None, will not drop the NaN from the uniques of the values.
A Series or Index that’s an indexer into uniques. uniques.take(codes) will have the same values as values.
uniques.take(codes)
The unique valid values.
Note
Even if there’s a missing value in values, uniques will not contain an entry for it.
Examples
>>> psser = ps.Series(['b', None, 'a', 'c', 'b']) >>> codes, uniques = psser.factorize() >>> codes 0 1 1 -1 2 0 3 2 4 1 dtype: int32 >>> uniques Index(['a', 'b', 'c'], dtype='object')
>>> codes, uniques = psser.factorize(na_sentinel=None) >>> codes 0 1 1 3 2 0 3 2 4 1 dtype: int32 >>> uniques Index(['a', 'b', 'c', None], dtype='object')
>>> codes, uniques = psser.factorize(na_sentinel=-2) >>> codes 0 1 1 -2 2 0 3 2 4 1 dtype: int32 >>> uniques Index(['a', 'b', 'c'], dtype='object')
For Index:
>>> psidx = ps.Index(['b', None, 'a', 'c', 'b']) >>> codes, uniques = psidx.factorize() >>> codes Int64Index([1, -1, 0, 2, 1], dtype='int64') >>> uniques Index(['a', 'b', 'c'], dtype='object')