DataFrame.
median
Return the median of the values for the requested axis.
Note
Unlike pandas’, the median in pandas-on-Spark is an approximated median based upon approximate percentile computation because computing median across a large dataset is extremely expensive.
Axis for the function to be applied on.
Exclude NA/null values when computing the result.
Changed in version 3.4.0: Supported including NA/null values.
Include only float, int, boolean columns. False is not supported. This parameter is mainly for pandas compatibility.
Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy.
Examples
>>> df = ps.DataFrame({ ... 'a': [24., 21., 25., 33., 26.], 'b': [1, 2, 3, 4, 5]}, columns=['a', 'b']) >>> df a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5
On a DataFrame:
>>> df.median() a 25.0 b 3.0 dtype: float64
On a Series:
>>> df['a'].median() 25.0 >>> (df['b'] + 100).median() 103.0
For multi-index columns,
>>> df.columns = pd.MultiIndex.from_tuples([('x', 'a'), ('y', 'b')]) >>> df x y a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5
>>> df.median() x a 25.0 y b 3.0 dtype: float64
>>> df.median(axis=1) 0 12.5 1 11.5 2 14.0 3 18.5 4 15.5 dtype: float64
>>> df[('x', 'a')].median() 25.0 >>> (df[('y', 'b')] + 100).median() 103.0