pyspark.sql.functions.
transform
Returns an array of elements after applying a transformation to each element in the input array.
New in version 3.1.0.
Changed in version 3.4.0: Supports Spark Connect.
Column
name of column or expression
a function that is applied to each element of the input array. Can take one of the following forms:
Unary (x: Column) -> Column: ...
(x: Column) -> Column: ...
(x: Column, i: Column) -> Column...
a 0-based index of the element.
and can use methods of Column, functions defined in pyspark.sql.functions and Scala UserDefinedFunctions. Python UserDefinedFunctions are not supported (SPARK-27052).
pyspark.sql.functions
UserDefinedFunctions
a new array of transformed elements.
Examples
>>> df = spark.createDataFrame([(1, [1, 2, 3, 4])], ("key", "values")) >>> df.select(transform("values", lambda x: x * 2).alias("doubled")).show() +------------+ | doubled| +------------+ |[2, 4, 6, 8]| +------------+
>>> def alternate(x, i): ... return when(i % 2 == 0, x).otherwise(-x) >>> df.select(transform("values", alternate).alias("alternated")).show() +--------------+ | alternated| +--------------+ |[1, -2, 3, -4]| +--------------+