Index.
value_counts
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
If True then the object returned will contain the relative frequencies of the unique values.
Sort by values.
Sort in ascending order.
Don’t include counts of NaN.
See also
Series.count
Number of non-NA elements in a Series.
Examples
For Series
>>> df = ps.DataFrame({'x':[0, 0, 1, 1, 1, np.nan]}) >>> df.x.value_counts() 1.0 3 0.0 2 Name: x, dtype: int64
With normalize set to True, returns the relative frequency by dividing all values by the sum of values.
>>> df.x.value_counts(normalize=True) 1.0 0.6 0.0 0.4 Name: x, dtype: float64
dropna With dropna set to False we can also see NaN index values.
>>> df.x.value_counts(dropna=False) 1.0 3 0.0 2 NaN 1 Name: x, dtype: int64
For Index
>>> idx = ps.Index([3, 1, 2, 3, 4, np.nan]) >>> idx Float64Index([3.0, 1.0, 2.0, 3.0, 4.0, nan], dtype='float64')
>>> idx.value_counts().sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 dtype: int64
sort
With sort set to False, the result wouldn’t be sorted by number of count.
>>> idx.value_counts(sort=True).sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 dtype: int64
normalize
>>> idx.value_counts(normalize=True).sort_index() 1.0 0.2 2.0 0.2 3.0 0.4 4.0 0.2 dtype: float64
dropna
With dropna set to False we can also see NaN index values.
>>> idx.value_counts(dropna=False).sort_index() 1.0 1 2.0 1 3.0 2 4.0 1 NaN 1 dtype: int64
For MultiIndex.
>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'], ... ['speed', 'weight', 'length']], ... [[0, 0, 0, 1, 1, 1, 2, 2, 2], ... [1, 1, 1, 1, 1, 2, 1, 2, 2]]) >>> s = ps.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx) >>> s.index MultiIndex([( 'lama', 'weight'), ( 'lama', 'weight'), ( 'lama', 'weight'), ( 'cow', 'weight'), ( 'cow', 'weight'), ( 'cow', 'length'), ('falcon', 'weight'), ('falcon', 'length'), ('falcon', 'length')], )
>>> s.index.value_counts().sort_index() (cow, length) 1 (cow, weight) 2 (falcon, length) 2 (falcon, weight) 1 (lama, weight) 3 dtype: int64
>>> s.index.value_counts(normalize=True).sort_index() (cow, length) 0.111111 (cow, weight) 0.222222 (falcon, length) 0.222222 (falcon, weight) 0.111111 (lama, weight) 0.333333 dtype: float64
If Index has name, keep the name up.
>>> idx = ps.Index([0, 0, 0, 1, 1, 2, 3], name='pandas-on-Spark') >>> idx.value_counts().sort_index() 0 3 1 2 2 1 3 1 Name: pandas-on-Spark, dtype: int64