pyspark.pandas.groupby.GroupBy.nth#
- GroupBy.nth(n)[source]#
Take the nth row from each group.
New in version 3.4.0.
- Parameters
- nint
A single nth value for the row
- Returns
- Series or DataFrame
Notes
There is a behavior difference between pandas-on-Spark and pandas:
- when there is no aggregation column, and n not equal to 0 or -1,
the returned empty dataframe may have an index with different lenght __len__.
Examples
>>> import numpy as np >>> df = ps.DataFrame({'A': [1, 1, 2, 1, 2], ... 'B': [np.nan, 2, 3, 4, 5]}, columns=['A', 'B']) >>> g = df.groupby('A') >>> g.nth(0) A B 0 1 NaN 2 2 3.0 >>> g.nth(1) A B 1 1 2.0 4 2 5.0 >>> g.nth(-1) A B 3 1 4.0 4 2 5.0