pyspark.pandas.DataFrame.first_valid_index

DataFrame.first_valid_index() → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Tuple[Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None], …]]

Retrieves the index of the first valid value.

Returns
scalar, tuple, or None

Examples

Support for DataFrame

>>> psdf = ps.DataFrame({'a': [None, 2, 3, 2],
...                     'b': [None, 2.0, 3.0, 1.0],
...                     'c': [None, 200, 400, 200]},
...                     index=['Q', 'W', 'E', 'R'])
>>> psdf
     a    b      c
Q  NaN  NaN    NaN
W  2.0  2.0  200.0
E  3.0  3.0  400.0
R  2.0  1.0  200.0
>>> psdf.first_valid_index()
'W'

Support for MultiIndex columns

>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
     a    b      c
     x    y      z
Q  NaN  NaN    NaN
W  2.0  2.0  200.0
E  3.0  3.0  400.0
R  2.0  1.0  200.0
>>> psdf.first_valid_index()
'W'

Support for Series.

>>> s = ps.Series([None, None, 3, 4, 5], index=[100, 200, 300, 400, 500])
>>> s
100    NaN
200    NaN
300    3.0
400    4.0
500    5.0
dtype: float64
>>> s.first_valid_index()
300

Support for MultiIndex

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = ps.Series([None, None, None, None, 250, 1.5, 320, 1, 0.3], index=midx)
>>> s
lama    speed       NaN
        weight      NaN
        length      NaN
cow     speed       NaN
        weight    250.0
        length      1.5
falcon  speed     320.0
        weight      1.0
        length      0.3
dtype: float64
>>> s.first_valid_index()
('cow', 'weight')