RDD.
sortBy
Sorts this RDD by the given keyfunc
New in version 1.1.0.
a function to compute the key
sort the keys in ascending or descending order
the number of partitions in new RDD
RDD
a new RDD
See also
RDD.sortByKey()
pyspark.sql.DataFrame.sort()
Examples
>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)] >>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect() [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)] >>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect() [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]