pyspark.RDD.sortBy¶
-
RDD.
sortBy
(keyfunc: Callable[[T], S], ascending: bool = True, numPartitions: Optional[int] = None) → RDD[T][source]¶ Sorts this RDD by the given keyfunc
New in version 1.1.0.
- Parameters
- keyfuncfunction
a function to compute the key
- ascendingbool, optional, default True
sort the keys in ascending or descending order
- numPartitionsint, optional
the number of partitions in new
RDD
- Returns
Examples
>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)] >>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect() [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)] >>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect() [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]