pyspark.pandas.Series.cumsum¶
-
Series.
cumsum
(skipna: bool = True) → FrameLike¶ Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative sum.
Note
the current implementation of cumsum uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
- Parameters
- skipnaboolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.
- Returns
- DataFrame or Series
See also
DataFrame.sum
Return the sum over DataFrame axis.
DataFrame.cummax
Return cumulative maximum over DataFrame axis.
DataFrame.cummin
Return cumulative minimum over DataFrame axis.
DataFrame.cumsum
Return cumulative sum over DataFrame axis.
DataFrame.cumprod
Return cumulative product over DataFrame axis.
Series.sum
Return the sum over Series axis.
Series.cummax
Return cumulative maximum over Series axis.
Series.cummin
Return cumulative minimum over Series axis.
Series.cumsum
Return cumulative sum over Series axis.
Series.cumprod
Return cumulative product over Series axis.
Examples
>>> df = ps.DataFrame([[2.0, 1.0], [3.0, None], [1.0, 0.0]], columns=list('AB')) >>> df A B 0 2.0 1.0 1 3.0 NaN 2 1.0 0.0
By default, iterates over rows and finds the sum in each column.
>>> df.cumsum() A B 0 2.0 1.0 1 5.0 NaN 2 6.0 1.0
It works identically in Series.
>>> df.A.cumsum() 0 2.0 1 5.0 2 6.0 Name: A, dtype: float64