pyspark.RDD.countByKey#
- RDD.countByKey()[source]#
Count the number of elements for each key, and return the result to the master as a dictionary.
New in version 0.7.0.
- Returns
- dict
a dictionary of (key, count) pairs
See also
Examples
>>> rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)]) >>> sorted(rdd.countByKey().items()) [('a', 2), ('b', 1)]