pyspark.sql.functions.array

pyspark.sql.functions.array(*cols: Union[ColumnOrName, List[ColumnOrName_], Tuple[ColumnOrName_, …]]) → pyspark.sql.column.Column[source]

Creates a new array column.

New in version 1.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsColumn or str

column names or Columns that have the same data type.

Returns
Column

a column of array type.

Examples

>>> df = spark.createDataFrame([("Alice", 2), ("Bob", 5)], ("name", "age"))
>>> df.select(array('age', 'age').alias("arr")).collect()
[Row(arr=[2, 2]), Row(arr=[5, 5])]
>>> df.select(array([df.age, df.age]).alias("arr")).collect()
[Row(arr=[2, 2]), Row(arr=[5, 5])]
>>> df.select(array('age', 'age').alias("col")).printSchema()
root
 |-- col: array (nullable = false)
 |    |-- element: long (containsNull = true)