pyspark.sql.functions.median

pyspark.sql.functions.median(col: ColumnOrName) → pyspark.sql.column.Column[source]

Returns the median of the values in a group.

New in version 3.4.0.

Parameters
colColumn or str

target column to compute on.

Returns
Column

the median of the values in a group.

Notes

Supports Spark Connect.

Examples

>>> df = spark.createDataFrame([
...     ("Java", 2012, 20000), ("dotNET", 2012, 5000),
...     ("Java", 2012, 22000), ("dotNET", 2012, 10000),
...     ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
...     schema=("course", "year", "earnings"))
>>> df.groupby("course").agg(median("earnings")).show()
+------+----------------+
|course|median(earnings)|
+------+----------------+
|  Java|         22000.0|
|dotNET|         10000.0|
+------+----------------+