pyspark.sql.functions.array_compact#
- pyspark.sql.functions.array_compact(col)[source]#
Array function: removes null values from the array.
New in version 3.4.0.
- Parameters
- col
Column
or str name of column or expression
- col
- Returns
Column
A new column that is an array excluding the null values from the input column.
Notes
Supports Spark Connect.
Examples
Example 1: Removing null values from a simple array
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, None, 2, 3],)], ['data']) >>> df.select(sf.array_compact(df.data)).show() +-------------------+ |array_compact(data)| +-------------------+ | [1, 2, 3]| +-------------------+
Example 2: Removing null values from multiple arrays
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, None, 2, 3],), ([4, 5, None, 4],)], ['data']) >>> df.select(sf.array_compact(df.data)).show() +-------------------+ |array_compact(data)| +-------------------+ | [1, 2, 3]| | [4, 5, 4]| +-------------------+
Example 3: Removing null values from an array with all null values
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import ArrayType, StringType, StructField, StructType >>> schema = StructType([ ... StructField("data", ArrayType(StringType()), True) ... ]) >>> df = spark.createDataFrame([([None, None, None],)], schema) >>> df.select(sf.array_compact(df.data)).show() +-------------------+ |array_compact(data)| +-------------------+ | []| +-------------------+
Example 4: Removing null values from an array with no null values
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, 3],)], ['data']) >>> df.select(sf.array_compact(df.data)).show() +-------------------+ |array_compact(data)| +-------------------+ | [1, 2, 3]| +-------------------+
Example 5: Removing null values from an empty array
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import ArrayType, StringType, StructField, StructType >>> schema = StructType([ ... StructField("data", ArrayType(StringType()), True) ... ]) >>> df = spark.createDataFrame([([],)], schema) >>> df.select(sf.array_compact(df.data)).show() +-------------------+ |array_compact(data)| +-------------------+ | []| +-------------------+