DataStreamReader(spark)
DataStreamReader
Interface used to load a streaming DataFrame from external storage systems (e.g.
DataFrame
DataStreamWriter(df)
DataStreamWriter
Interface used to write a streaming DataFrame to external storage systems (e.g.
ForeachBatchFunction(sql_ctx, func)
ForeachBatchFunction
This is the Python implementation of Java interface ‘ForeachBatchFunction’.
StreamingQuery(jsq)
StreamingQuery
A handle to a query that is executing continuously in the background as new data arrives.
StreamingQueryException(desc, stackTrace[, …])
StreamingQueryException
Exception that stopped a StreamingQuery.
StreamingQueryManager(jsqm)
StreamingQueryManager
A class to manage all the StreamingQuery StreamingQueries active.
DataStreamReader.csv(path[, schema, sep, …])
DataStreamReader.csv
Loads a CSV file stream and returns the result as a DataFrame.
DataStreamReader.format(source)
DataStreamReader.format
Specifies the input data source format.
DataStreamReader.json(path[, schema, …])
DataStreamReader.json
Loads a JSON file stream and returns the results as a DataFrame.
DataStreamReader.load([path, format, schema])
DataStreamReader.load
Loads a data stream from a data source and returns it as a DataFrame.
DataStreamReader.option(key, value)
DataStreamReader.option
Adds an input option for the underlying data source.
DataStreamReader.options(**options)
DataStreamReader.options
Adds input options for the underlying data source.
DataStreamReader.orc(path[, mergeSchema, …])
DataStreamReader.orc
Loads a ORC file stream, returning the result as a DataFrame.
DataStreamReader.parquet(path[, …])
DataStreamReader.parquet
Loads a Parquet file stream, returning the result as a DataFrame.
DataStreamReader.schema(schema)
DataStreamReader.schema
Specifies the input schema.
DataStreamReader.text(path[, wholetext, …])
DataStreamReader.text
Loads a text file stream and returns a DataFrame whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.
DataStreamWriter.foreach(f)
DataStreamWriter.foreach
Sets the output of the streaming query to be processed using the provided writer f.
f
DataStreamWriter.foreachBatch(func)
DataStreamWriter.foreachBatch
Sets the output of the streaming query to be processed using the provided function.
DataStreamWriter.format(source)
DataStreamWriter.format
Specifies the underlying output data source.
DataStreamWriter.option(key, value)
DataStreamWriter.option
Adds an output option for the underlying data source.
DataStreamWriter.options(**options)
DataStreamWriter.options
Adds output options for the underlying data source.
DataStreamWriter.outputMode(outputMode)
DataStreamWriter.outputMode
Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
DataStreamWriter.partitionBy(*cols)
DataStreamWriter.partitionBy
Partitions the output by the given columns on the file system.
DataStreamWriter.queryName(queryName)
DataStreamWriter.queryName
Specifies the name of the StreamingQuery that can be started with start().
start()
DataStreamWriter.start([path, format, …])
DataStreamWriter.start
Streams the contents of the DataFrame to a data source.
DataStreamWriter.trigger(*[, …])
DataStreamWriter.trigger
Set the trigger for the stream query.
StreamingQuery.awaitTermination([timeout])
StreamingQuery.awaitTermination
Waits for the termination of this query, either by query.stop() or by an exception.
query.stop()
StreamingQuery.exception()
StreamingQuery.exception
New in version 2.1.0.
StreamingQuery.explain([extended])
StreamingQuery.explain
Prints the (logical and physical) plans to the console for debugging purpose.
StreamingQuery.id
Returns the unique id of this query that persists across restarts from checkpoint data.
StreamingQuery.isActive
Whether this streaming query is currently active or not.
StreamingQuery.lastProgress
Returns the most recent StreamingQueryProgress update of this streaming query or None if there were no progress updates
StreamingQueryProgress
StreamingQuery.name
Returns the user-specified name of the query, or null if not specified.
StreamingQuery.processAllAvailable()
StreamingQuery.processAllAvailable
Blocks until all available data in the source has been processed and committed to the sink.
StreamingQuery.recentProgress
Returns an array of the most recent [[StreamingQueryProgress]] updates for this query.
StreamingQuery.runId
Returns the unique id of this query that does not persist across restarts.
StreamingQuery.status
Returns the current status of the query.
StreamingQuery.stop()
StreamingQuery.stop
Stop this streaming query.
StreamingQueryManager.active
Returns a list of active queries associated with this SQLContext
StreamingQueryManager.awaitAnyTermination([…])
StreamingQueryManager.awaitAnyTermination
Wait until any of the queries on the associated SQLContext has terminated since the creation of the context, or since resetTerminated() was called.
resetTerminated()
StreamingQueryManager.get(id)
StreamingQueryManager.get
Returns an active query from this SQLContext or throws exception if an active query with this name doesn’t exist.
StreamingQueryManager.resetTerminated()
StreamingQueryManager.resetTerminated
Forget about past terminated queries so that awaitAnyTermination() can be used again to wait for new terminations.
awaitAnyTermination()