pyspark.sql.streaming.DataStreamReader¶
-
class
pyspark.sql.streaming.
DataStreamReader
(spark: SparkSession)[source]¶ Interface used to load a streaming
DataFrame
from external storage systems (e.g. file systems, key-value stores, etc). UseSparkSession.readStream
to access this.New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
Notes
This API is evolving.
Examples
>>> spark.readStream <...streaming.readwriter.DataStreamReader object ...>
The example below uses Rate source that generates rows continuously. After that, we operate a modulo by 3, and then writes the stream out to the console. The streaming query stops in 3 seconds.
>>> import time >>> df = spark.readStream.format("rate").load() >>> df = df.selectExpr("value % 3 as v") >>> q = df.writeStream.format("console").start() >>> time.sleep(3) >>> q.stop()
Methods
csv
(path[, schema, sep, encoding, quote, …])Loads a CSV file stream and returns the result as a
DataFrame
.format
(source)Specifies the input data source format.
json
(path[, schema, primitivesAsString, …])Loads a JSON file stream and returns the results as a
DataFrame
.load
([path, format, schema])Loads a data stream from a data source and returns it as a
DataFrame
.option
(key, value)Adds an input option for the underlying data source.
options
(**options)Adds input options for the underlying data source.
orc
(path[, mergeSchema, pathGlobFilter, …])Loads a ORC file stream, returning the result as a
DataFrame
.parquet
(path[, mergeSchema, pathGlobFilter, …])Loads a Parquet file stream, returning the result as a
DataFrame
.schema
(schema)Specifies the input schema.
table
(tableName)Define a Streaming DataFrame on a Table.
text
(path[, wholetext, lineSep, …])Loads a text file stream and returns a
DataFrame
whose schema starts with a string column named “value”, and followed by partitioned columns if there are any.