pyspark.sql.functions.
min_by
Returns the value associated with the minimum value of ord.
New in version 3.3.0.
Column
target column that the value will be returned
column to be minimized
value associated with the minimum value of ord.
Examples
>>> df = spark.createDataFrame([ ... ("Java", 2012, 20000), ("dotNET", 2012, 5000), ... ("dotNET", 2013, 48000), ("Java", 2013, 30000)], ... schema=("course", "year", "earnings")) >>> df.groupby("course").agg(min_by("year", "earnings")).show() +------+----------------------+ |course|min_by(year, earnings)| +------+----------------------+ | Java| 2012| |dotNET| 2012| +------+----------------------+