pyspark.sql.functions.count_distinct¶
-
pyspark.sql.functions.
count_distinct
(col: ColumnOrName, *cols: ColumnOrName) → pyspark.sql.column.Column[source]¶ Returns a new
Column
for distinct count ofcol
orcols
.New in version 3.2.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
distinct values of these two column values.
Examples
>>> from pyspark.sql import types >>> df1 = spark.createDataFrame([1, 1, 3], types.IntegerType()) >>> df2 = spark.createDataFrame([1, 2], types.IntegerType()) >>> df1.join(df2).show() +-----+-----+ |value|value| +-----+-----+ | 1| 1| | 1| 2| | 1| 1| | 1| 2| | 3| 1| | 3| 2| +-----+-----+ >>> df1.join(df2).select(count_distinct(df1.value, df2.value)).show() +----------------------------+ |count(DISTINCT value, value)| +----------------------------+ | 4| +----------------------------+