pyspark.pandas.CategoricalIndex#

class pyspark.pandas.CategoricalIndex(data=None, categories=None, ordered=None, dtype=None, copy=False, name=None)[source]#

Index based on an underlying Categorical.

CategoricalIndex can only take on a limited, and usually fixed, number of possible values (categories). Also, it might have an order, but numerical operations (additions, divisions, …) are not possible.

Parameters

dataarray-like (1-dimensional): The values of the categorical. If categories are given, values not in categories will be replaced with NaN.
categoriesindex-like, optional: The categories for the categorical. Items need to be unique. If the categories are not given here (and also not in dtype), they will be inferred from the data.
orderedbool, optional: Whether or not this categorical is treated as an ordered categorical. If not given here or in dtype, the resulting categorical will be unordered.
dtypeCategoricalDtype or “category”, optional: If CategoricalDtype, cannot be used together with categories or ordered.
copybool, default False: Make a copy of input ndarray.
nameobject, optional: Name to be stored in the index.

See also

Index: The base pandas-on-Spark Index type.

Examples

>>> ps.CategoricalIndex(["a", "b", "c", "a", "b", "c"])  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

CategoricalIndex can also be instantiated from a Categorical:

>>> c = pd.Categorical(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(c)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

Ordered CategoricalIndex can have a min and max value.

>>> ci = ps.CategoricalIndex(
...     ["a", "b", "c", "a", "b", "c"], ordered=True, categories=["c", "b", "a"]
... )
>>> ci  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['c', 'b', 'a'], ordered=True, dtype='category')

From a Series:

>>> s = ps.Series(["a", "b", "c", "a", "b", "c"], index=[10, 20, 30, 40, 50, 60])
>>> ps.CategoricalIndex(s)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

From an Index:

>>> idx = ps.Index(["a", "b", "c", "a", "b", "c"])
>>> ps.CategoricalIndex(idx)  
CategoricalIndex(['a', 'b', 'c', 'a', 'b', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')

Methods

`add_categories`(new_categories)	Add new categories.
`all`(args, *kwargs)	Return whether all elements are True.
`any`([axis])	Return whether any element is True.
`append`(other)	Append a collection of Index options together.
`argmax`()	Return a maximum argument indexer.
`argmin`()	Return a minimum argument indexer.
`as_ordered`()	Set the Categorical to be ordered.
`as_unordered`()	Set the Categorical to be unordered.
`asof`(label)	Return the label from the index, or, if not present, the previous one.
`astype`(dtype)	Cast a pandas-on-Spark object to a specified dtype `dtype`.
`copy`([name, deep])	Make a copy of this object.
`delete`(loc)	Make new Index with passed location(-s) deleted.
`difference`(other[, sort])	Return a new Index with elements from the index that are not in other.
`drop`(labels)	Make new Index with passed list of labels deleted.
`drop_duplicates`([keep])	Return Index with duplicate values removed.
`droplevel`(level)	Return index with requested level(s) removed.
`dropna`([how])	Return Index or MultiIndex without NA/NaN values
`equals`(other)	Determine if two Index objects contain the same elements.
`factorize`([sort, use_na_sentinel])	Encode the object as an enumerated type or categorical variable.
`fillna`(value)	Fill NA/NaN values with the specified value.
`get_level_values`(level)	Return Index if a valid level is given.
`holds_integer`()	Whether the type is an integer type.
`identical`(other)	Similar to equals, but check that other comparable attributes are also equal.
`insert`(loc, item)	Make new Index inserting new item at location.
`intersection`(other)	Form the intersection of two Index objects.
`is_boolean`()	Return if the current index type is a boolean type.
`is_categorical`()	Return if the current index type is a categorical type.
`is_floating`()	Return if the current index type is a floating type.
`is_integer`()	Return if the current index type is an integer type.
`is_interval`()	Return if the current index type is an interval type.
`is_numeric`()	Return if the current index type is a numeric type.
`is_object`()	Return if the current index type is an object type.
`isin`(values)	Check whether values are contained in Series or Index.
`isna`()	Detect existing (non-missing) values.
`isnull`()	Detect existing (non-missing) values.
`item`()	Return the first element of the underlying data as a python scalar.
`map`(mapper)	Map values using input correspondence (a dict, Series, or function).
`max`()	Return the maximum value of the Index.
`min`()	Return the minimum value of the Index.
`notna`()	Detect existing (non-missing) values.
`notnull`()	Detect existing (non-missing) values.
`nunique`([dropna, approx, rsd])	Return number of unique elements in the object.
`remove_categories`(removals)	Remove the specified categories.
`remove_unused_categories`()	Remove categories which are not used.
`rename`(name[, inplace])	Alter Index or MultiIndex name.
`rename_categories`(new_categories)	Rename categories.
`reorder_categories`(new_categories[, ordered])	Reorder categories as specified in new_categories.
`repeat`(repeats)	Repeat elements of a Index/MultiIndex.
`set_categories`(new_categories[, ordered, rename])	Set the categories to the specified new_categories.
`set_names`(names[, level, inplace])	Set Index or MultiIndex name.
`shift`([periods, fill_value])	Shift Series/Index by desired number of periods.
`sort`(args, *kwargs)	Use sort_values instead.
`sort_values`([return_indexer, ascending])	Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
`symmetric_difference`(other[, result_name, sort])	Compute the symmetric difference of two Index objects.
`take`(indices)	Return the elements in the given positional indices along an axis.
`to_frame`([index, name])	Create a DataFrame with a column containing the Index.
`to_list`()	Return a list of the values.
`to_numpy`([dtype, copy])	A NumPy ndarray representing the values in this Index or MultiIndex.
`to_pandas`()	Return a pandas Index.
`to_series`([name])	Create a Series with both index and values equal to the index keys useful with map for returning an indexer based on an index.
`tolist`()	Return a list of the values.
`transpose`()	Return the transpose, For index, It will be index itself.
`union`(other[, sort])	Form the union of two Index objects.
`unique`([level])	Return unique values in the index.
`value_counts`([normalize, sort, ascending, ...])	Return a Series containing counts of unique values.
`view`()	this is defined as a copy with the same identity

Attributes

`T`	Return the transpose, For index, It will be index itself.
`categories`	The categories of this categorical.
`codes`	The category codes of this categorical.
`dtype`	Return the dtype object of the underlying data.
`empty`	Returns true if the current object is empty.
`has_duplicates`	If index has duplicates, return True, otherwise False.
`hasnans`	Return True if it has any missing values.
`inferred_type`	Return a string of the type inferred from the values.
`is_monotonic_decreasing`	Return boolean if values in the object are monotonically decreasing.
`is_monotonic_increasing`	Return boolean if values in the object are monotonically increasing.
`is_unique`	Return if the index has unique values.
`name`	Return name of the Index.
`names`	Return names of the Index.
`ndim`	Return an int representing the number of array dimensions.
`nlevels`	Number of levels in Index & MultiIndex.
`ordered`	Whether the categories have an ordered relationship.
`shape`	Return a tuple of the shape of the underlying data.
`size`	Return an int representing the number of elements in this object.
`values`	Return an array representing the data in the Index.