pyspark.pandas.groupby.GroupBy.sum¶
-
GroupBy.
sum
(numeric_only: Optional[bool] = True, min_count: int = 0) → FrameLike[source]¶ Compute sum of group values
New in version 3.3.0.
- Parameters
- numeric_onlybool, default False
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. It takes no effect since only numeric columns can be support here.
New in version 3.4.0.
- min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
New in version 3.4.0.
Notes
There is a behavior difference between pandas-on-Spark and pandas:
- when there is a non-numeric aggregation column, it will be ignored
even if numeric_only is False.
Examples
>>> df = ps.DataFrame({"A": [1, 2, 1, 2], "B": [True, False, False, True], ... "C": [3, 4, 3, 4], "D": ["a", "a", "b", "a"]})
>>> df.groupby("A").sum().sort_index() B C A 1 1 6 2 1 8
>>> df.groupby("D").sum().sort_index() A B C D a 5 2 11 b 1 0 3
>>> df.groupby("D").sum(min_count=3).sort_index() A B C D a 5.0 2.0 11.0 b NaN NaN NaN