DataStreamWriter.
toTable
Starts the execution of the streaming query, which will continually output results to the given table as new data arrives.
The returned StreamingQuery object can be used to interact with the stream.
StreamingQuery
New in version 3.1.0.
string, for the name of the table.
the format used to save.
specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
append: Only the new rows in the streaming DataFrame/Dataset will be written to the sink
complete: All the rows in the streaming DataFrame/Dataset will be written to the sink every time these are some updates
update: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates. If the query doesn’t contain aggregations, it will be equivalent to append mode.
names of partitioning columns
unique name for the query
All other string options. You may want to provide a checkpointLocation.
Notes
This API is evolving.
For v1 table, partitioning columns provided by partitionBy will be respected no matter the table exists or not. A new table will be created if the table not exists.
For v2 table, partitionBy will be ignored if the table already exists. partitionBy will be respected only if the v2 table does not exist. Besides, the v2 table created by this API lacks some functionalities (e.g., customized properties, options, and serde info). If you need them, please create the v2 table manually before the execution to avoid creating a table with incomplete information.
Examples
>>> sdf.writeStream.format('parquet').queryName('query').toTable('output_table') ...
>>> sdf.writeStream.trigger(processingTime='5 seconds').toTable( ... 'output_table', ... queryName='that_query', ... outputMode="append", ... format='parquet', ... checkpointLocation='/tmp/checkpoint')