pyspark.sql.functions.from_csv¶
-
pyspark.sql.functions.
from_csv
(col, schema, options=None)[source]¶ Parses a column containing a CSV string to a row with the specified schema. Returns null, in the case of an unparseable string.
New in version 3.0.0.
- Parameters
- col
Column
or str string column in CSV format
- schema :class:`~pyspark.sql.Column` or str
a string with schema in DDL format to use when parsing the CSV column.
- optionsdict, optional
options to control parsing. accepts the same options as the CSV datasource. See Data Source Option in the version you use.
- col
Examples
>>> data = [("1,2,3",)] >>> df = spark.createDataFrame(data, ("value",)) >>> df.select(from_csv(df.value, "a INT, b INT, c INT").alias("csv")).collect() [Row(csv=Row(a=1, b=2, c=3))] >>> value = data[0][0] >>> df.select(from_csv(df.value, schema_of_csv(value)).alias("csv")).collect() [Row(csv=Row(_c0=1, _c1=2, _c2=3))] >>> data = [(" abc",)] >>> df = spark.createDataFrame(data, ("value",)) >>> options = {'ignoreLeadingWhiteSpace': True} >>> df.select(from_csv(df.value, "s string", options).alias("csv")).collect() [Row(csv=Row(s='abc'))]