pyspark.sql.functions.
from_csv
Parses a column containing a CSV string to a row with the specified schema. Returns null, in the case of an unparseable string.
New in version 3.0.0.
Column
a column or column name in CSV format
a column, or Python string literal with schema in DDL format, to use when parsing the CSV column.
options to control parsing. accepts the same options as the CSV datasource. See Data Source Option in the version you use.
Examples
>>> data = [("1,2,3",)] >>> df = spark.createDataFrame(data, ("value",)) >>> df.select(from_csv(df.value, "a INT, b INT, c INT").alias("csv")).collect() [Row(csv=Row(a=1, b=2, c=3))] >>> value = data[0][0] >>> df.select(from_csv(df.value, schema_of_csv(value)).alias("csv")).collect() [Row(csv=Row(_c0=1, _c1=2, _c2=3))] >>> data = [(" abc",)] >>> df = spark.createDataFrame(data, ("value",)) >>> options = {'ignoreLeadingWhiteSpace': True} >>> df.select(from_csv(df.value, "s string", options).alias("csv")).collect() [Row(csv=Row(s='abc'))]