RDD.
intersection
Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did.
New in version 1.0.0.
RDD
another RDD
the intersection of this RDD and another one
See also
pyspark.sql.DataFrame.intersect()
Notes
This method performs a shuffle internally.
Examples
>>> rdd1 = sc.parallelize([1, 10, 2, 3, 4, 5]) >>> rdd2 = sc.parallelize([1, 6, 2, 3, 7, 8]) >>> rdd1.intersection(rdd2).collect() [1, 2, 3]