pyspark.sql.DataFrameWriter.json

DataFrameWriter.json(path, mode=None, compression=None, dateFormat=None, timestampFormat=None, lineSep=None, encoding=None, ignoreNullFields=None)[source]

Saves the content of the DataFrame in JSON format (JSON Lines text format or newline-delimited JSON) at the specified path.

New in version 1.4.0.

Parameters
pathstr

the path in any Hadoop supported file system

modestr, optional

specifies the behavior of the save operation when data already exists.

  • append: Append contents of this DataFrame to existing data.

  • overwrite: Overwrite existing data.

  • ignore: Silently ignore this operation if data already exists.

  • error or errorifexists (default case): Throw an exception if data already exists.

compressionstr, optional

compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).

dateFormatstr, optional

sets the string that indicates a date format. Custom date formats follow the formats at datetime pattern. # noqa This applies to date type. If None is set, it uses the default value, yyyy-MM-dd.

timestampFormatstr, optional

sets the string that indicates a timestamp format. Custom date formats follow the formats at datetime pattern. # noqa This applies to timestamp type. If None is set, it uses the default value, yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX].

encodingstr, optional

specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used.

lineSepstr, optional

defines the line separator that should be used for writing. If None is set, it uses the default value, \n.

ignoreNullFieldsstr or bool, optional

Whether to ignore null fields when generating JSON objects. If None is set, it uses the default value, true.

Examples

>>> df.write.json(os.path.join(tempfile.mkdtemp(), 'data'))