pyspark.sql.functions.from_csv¶

pyspark.sql.functions.from_csv(col: ColumnOrName, schema: Union[pyspark.sql.column.Column, str], options: Optional[Dict[str, str]] = None) → pyspark.sql.column.Column[source]¶

Parses a column containing a CSV string to a row with the specified schema. Returns null, in the case of an unparseable string.

New in version 3.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: a column or column name in CSV format
schema :class:`~pyspark.sql.Column` or str: a column, or Python string literal with schema in DDL format, to use when parsing the CSV column.
optionsdict, optional: options to control parsing. accepts the same options as the CSV datasource. See Data Source Option for the version you use.

Returns

Column: a column of parsed CSV values

Examples

>>> data = [("1,2,3",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> df.select(from_csv(df.value, "a INT, b INT, c INT").alias("csv")).collect()
[Row(csv=Row(a=1, b=2, c=3))]
>>> value = data[0][0]
>>> df.select(from_csv(df.value, schema_of_csv(value)).alias("csv")).collect()
[Row(csv=Row(_c0=1, _c1=2, _c2=3))]
>>> data = [("   abc",)]
>>> df = spark.createDataFrame(data, ("value",))
>>> options = {'ignoreLeadingWhiteSpace': True}
>>> df.select(from_csv(df.value, "s string", options).alias("csv")).collect()
[Row(csv=Row(s='abc'))]

pyspark.sql.functions.map_concat pyspark.sql.functions.schema_of_csv