pyspark.sql.GroupedData¶
-
class
pyspark.sql.
GroupedData
(jgd: py4j.java_gateway.JavaObject, df: pyspark.sql.dataframe.DataFrame)[source]¶ A set of methods for aggregations on a
DataFrame
, created byDataFrame.groupBy()
.New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
agg
(*exprs)Compute aggregates and returns the result as a
DataFrame
.apply
(udf)It is an alias of
pyspark.sql.GroupedData.applyInPandas()
; however, it takes apyspark.sql.functions.pandas_udf()
whereaspyspark.sql.GroupedData.applyInPandas()
takes a Python native function.applyInPandas
(func, schema)Maps each group of the current
DataFrame
using a pandas udf and returns the result as a DataFrame.applyInPandasWithState
(func, …)Applies the given function to each group of data, while maintaining a user-defined per-group state.
avg
(*cols)Computes average values for each numeric columns for each group.
cogroup
(other)Cogroups this group with another group so that we can run cogrouped operations.
count
()Counts the number of records for each group.
max
(*cols)Computes the max value for each numeric columns for each group.
mean
(*cols)Computes average values for each numeric columns for each group.
min
(*cols)Computes the min value for each numeric column for each group.
pivot
(pivot_col[, values])Pivots a column of the current
DataFrame
and perform the specified aggregation.sum
(*cols)Computes the sum for each numeric columns for each group.