pyspark.sql.functions.array_agg#
- pyspark.sql.functions.array_agg(col)[source]#
Aggregate function: returns a list of objects with duplicates.
New in version 3.5.0.
- Parameters
- col
Column
or str target column to compute on.
- col
- Returns
Column
list of objects with duplicates.
Examples
Example 1: Using array_agg function on an int column
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[1],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c'))).show() +---------------------------------+ |sort_array(collect_list(c), true)| +---------------------------------+ | [1, 1, 2]| +---------------------------------+
Example 2: Using array_agg function on a string column
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([["apple"],["apple"],["banana"]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c'))).show(truncate=False) +---------------------------------+ |sort_array(collect_list(c), true)| +---------------------------------+ |[apple, apple, banana] | +---------------------------------+
Example 3: Using array_agg function on a column with null values
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],[None],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c'))).show() +---------------------------------+ |sort_array(collect_list(c), true)| +---------------------------------+ | [1, 2]| +---------------------------------+
Example 4: Using array_agg function on a column with different data types
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([[1],["apple"],[2]], ["c"]) >>> df.agg(sf.sort_array(sf.array_agg('c'))).show() +---------------------------------+ |sort_array(collect_list(c), true)| +---------------------------------+ | [1, 2, apple]| +---------------------------------+