pyspark.pandas.DataFrame.align¶
-
DataFrame.
align
(other: Union[DataFrame, Series], join: str = 'outer', axis: Union[int, str, None] = None, copy: bool = True) → Tuple[DataFrame, Union[DataFrame, Series]][source]¶ Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
- Parameters
- otherDataFrame or Series
- join{{‘outer’, ‘inner’, ‘left’, ‘right’}}, default ‘outer’
- axisallowed axis of the other object, default None
Align on index (0), columns (1), or both (None).
- copybool, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.
- Returns
- (left, right)(DataFrame, type of other)
Aligned objects.
Examples
>>> ps.set_option("compute.ops_on_diff_frames", True) >>> df1 = ps.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]}, index=[10, 20, 30]) >>> df2 = ps.DataFrame({"a": [4, 5, 6], "c": ["d", "e", "f"]}, index=[10, 11, 12])
Align both axis:
>>> aligned_l, aligned_r = df1.align(df2) >>> aligned_l.sort_index() a b c 10 1.0 a NaN 11 NaN None NaN 12 NaN None NaN 20 2.0 b NaN 30 3.0 c NaN >>> aligned_r.sort_index() a b c 10 4.0 NaN d 11 5.0 NaN e 12 6.0 NaN f 20 NaN NaN None 30 NaN NaN None
Align only axis=0 (index):
>>> aligned_l, aligned_r = df1.align(df2, axis=0) >>> aligned_l.sort_index() a b 10 1.0 a 11 NaN None 12 NaN None 20 2.0 b 30 3.0 c >>> aligned_r.sort_index() a c 10 4.0 d 11 5.0 e 12 6.0 f 20 NaN None 30 NaN None
Align only axis=1 (column):
>>> aligned_l, aligned_r = df1.align(df2, axis=1) >>> aligned_l.sort_index() a b c 10 1 a NaN 20 2 b NaN 30 3 c NaN >>> aligned_r.sort_index() a b c 10 4 NaN d 11 5 NaN e 12 6 NaN f
Align with the join type “inner”:
>>> aligned_l, aligned_r = df1.align(df2, join="inner") >>> aligned_l.sort_index() a 10 1 >>> aligned_r.sort_index() a 10 4
Align with a Series:
>>> s = ps.Series([7, 8, 9], index=[10, 11, 12]) >>> aligned_l, aligned_r = df1.align(s, axis=0) >>> aligned_l.sort_index() a b 10 1.0 a 11 NaN None 12 NaN None 20 2.0 b 30 3.0 c >>> aligned_r.sort_index() 10 7.0 11 8.0 12 9.0 20 NaN 30 NaN dtype: float64
>>> ps.reset_option("compute.ops_on_diff_frames")