pyspark.sql.functions.
window_time
Computes the event time from a window column. The column window values are produced by window aggregating operators and are of type STRUCT<start: TIMESTAMP, end: TIMESTAMP> where start is inclusive and end is exclusive. The event time of records produced by window aggregating operators can be computed as window_time(window) and are window.end - lit(1).alias("microsecond") (as microsecond is the minimal supported event time precision). The window column must be one produced by a window aggregating operator.
window_time(window)
window.end - lit(1).alias("microsecond")
New in version 3.4.0.
Changed in version 3.4.0: Supports Spark Connect.
Column
The window column of a window aggregate records.
the column for computed results.
Examples
>>> import datetime >>> df = spark.createDataFrame( ... [(datetime.datetime(2016, 3, 11, 9, 0, 7), 1)], ... ).toDF("date", "val")
Group the data into 5 second time windows and aggregate as sum.
>>> w = df.groupBy(window("date", "5 seconds")).agg(sum("val").alias("sum"))
Extract the window event time using the window_time function.
>>> w.select( ... w.window.end.cast("string").alias("end"), ... window_time(w.window).cast("string").alias("window_time"), ... "sum" ... ).collect() [Row(end='2016-03-11 09:00:10', window_time='2016-03-11 09:00:09.999999', sum=1)]