pyspark.RDD.countByValue¶
-
RDD.
countByValue
() → Dict[K, int][source]¶ Return the count of each unique value in this RDD as a dictionary of (value, count) pairs.
New in version 0.7.0.
- Returns
- dict
a dictionary of (value, count) pairs
See also
Examples
>>> sorted(sc.parallelize([1, 2, 1, 2, 2], 2).countByValue().items()) [(1, 2), (2, 3)]