pyspark.sql.functions.flatten#

pyspark.sql.functions.flatten(col)[source]#

Array function: creates a single array from an array of arrays. If a structure of nested arrays is deeper than two levels, only one level of nesting is removed.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: The name of the column or expression to be flattened.

Returns

Column: A new column that contains the flattened array.

Examples

Example 1: Flattening a simple nested array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([[1, 2, 3], [4, 5], [6]],)], ['data'])
>>> df.select(sf.flatten(df.data)).show()
+------------------+
|     flatten(data)|
+------------------+
|[1, 2, 3, 4, 5, 6]|
+------------------+

Example 2: Flattening an array with null values

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([None, [4, 5]],)], ['data'])
>>> df.select(sf.flatten(df.data)).show()
+-------------+
|flatten(data)|
+-------------+
|         NULL|
+-------------+

Example 3: Flattening an array with more than two levels of nesting

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([[[1, 2], [3, 4]], [[5, 6], [7, 8]]],)], ['data'])
>>> df.select(sf.flatten(df.data)).show(truncate=False)
+--------------------------------+
|flatten(data)                   |
+--------------------------------+
|[[1, 2], [3, 4], [5, 6], [7, 8]]|
+--------------------------------+

Example 4: Flattening an array with mixed types

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([['a', 'b', 'c'], [1, 2, 3]],)], ['data'])
>>> df.select(sf.flatten(df.data)).show()
+------------------+
|     flatten(data)|
+------------------+
|[a, b, c, 1, 2, 3]|
+------------------+