DataFrameReader.
json
Loads JSON files and returns the results as a DataFrame.
DataFrame
JSON Lines (newline-delimited JSON) is supported by default. For JSON (one record per file), set the multiLine parameter to true.
multiLine
true
If the schema parameter is not specified, this function goes through the input once to determine the input schema.
schema
New in version 1.4.0.
RDD
string represents path to the JSON dataset, or a list of paths, or RDD of Strings storing JSON objects.
pyspark.sql.types.StructType
an optional pyspark.sql.types.StructType for the input schema or a DDL-formatted string (For example col0 INT, col1 DOUBLE).
col0 INT, col1 DOUBLE
For the extra options, refer to Data Source Option in the version you use.
Examples
>>> df1 = spark.read.json('python/test_support/sql/people.json') >>> df1.dtypes [('age', 'bigint'), ('name', 'string')] >>> rdd = sc.textFile('python/test_support/sql/people.json') >>> df2 = spark.read.json(rdd) >>> df2.dtypes [('age', 'bigint'), ('name', 'string')]