pyspark.sql.streaming.
DataStreamWriter
Interface used to write a streaming DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.writeStream to access this.
DataFrame
DataFrame.writeStream
New in version 2.0.0.
Notes
This API is evolving.
Examples
The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then writes the stream out to the console. The streaming query stops in 3 seconds.
>>> import time >>> df = spark.readStream.format("rate").load() >>> df = df.selectExpr("value % 3 as v") >>> q = df.writeStream.format("console").start() >>> time.sleep(3) >>> q.stop()
Methods
foreach(f)
foreach
Sets the output of the streaming query to be processed using the provided writer f.
f
foreachBatch(func)
foreachBatch
Sets the output of the streaming query to be processed using the provided function.
format(source)
format
Specifies the underlying output data source.
option(key, value)
option
Adds an output option for the underlying data source.
options(**options)
options
Adds output options for the underlying data source.
outputMode(outputMode)
outputMode
Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
partitionBy(*cols)
partitionBy
Partitions the output by the given columns on the file system.
queryName(queryName)
queryName
Specifies the name of the StreamingQuery that can be started with start().
StreamingQuery
start()
start([path, format, outputMode, …])
start
Streams the contents of the DataFrame to a data source.
toTable(tableName[, format, outputMode, …])
toTable
Starts the execution of the streaming query, which will continually output results to the given table as new data arrives.
trigger(*[, processingTime, once, …])
trigger
Set the trigger for the stream query.