pyspark.RDD.intersection#
- RDD.intersection(other)[source]#
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
New in version 1.0.0.
See also
Notes
This method performs a shuffle internally.
Examples
>>> rdd1 = sc.parallelize([1, 10, 2, 3, 4, 5]) >>> rdd2 = sc.parallelize([1, 6, 2, 3, 7, 8]) >>> rdd1.intersection(rdd2).collect() [1, 2, 3]