pyspark.sql.
GroupedData
A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().
DataFrame
DataFrame.groupBy()
New in version 1.3.0.
Changed in version 3.4.0: Supports Spark Connect.
Methods
agg(*exprs)
agg
Compute aggregates and returns the result as a DataFrame.
apply(udf)
apply
It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.
pyspark.sql.GroupedData.applyInPandas()
pyspark.sql.functions.pandas_udf()
applyInPandas(func, schema)
applyInPandas
Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.
applyInPandasWithState(func, …)
applyInPandasWithState
Applies the given function to each group of data, while maintaining a user-defined per-group state.
avg(*cols)
avg
Computes average values for each numeric columns for each group.
cogroup(other)
cogroup
Cogroups this group with another group so that we can run cogrouped operations.
count()
count
Counts the number of records for each group.
max(*cols)
max
Computes the max value for each numeric columns for each group.
mean(*cols)
mean
min(*cols)
min
Computes the min value for each numeric column for each group.
pivot(pivot_col[, values])
pivot
Pivots a column of the current DataFrame and perform the specified aggregation.
sum(*cols)
sum
Computes the sum for each numeric columns for each group.