pyspark.mllib.evaluation.
RankingMetrics
Evaluator for ranking algorithms.
New in version 1.4.0.
pyspark.RDD
an RDD of (predicted ranking, ground truth set) pairs or (predicted ranking, ground truth set, relevance value of ground truth set). Since 3.4.0, it supports ndcg evaluation with relevance value.
Examples
>>> predictionAndLabels = sc.parallelize([ ... ([1, 6, 2, 7, 8, 3, 9, 10, 4, 5], [1, 2, 3, 4, 5]), ... ([4, 1, 5, 6, 2, 7, 3, 8, 9, 10], [1, 2, 3]), ... ([1, 2, 3, 4, 5], [])]) >>> metrics = RankingMetrics(predictionAndLabels) >>> metrics.precisionAt(1) 0.33... >>> metrics.precisionAt(5) 0.26... >>> metrics.precisionAt(15) 0.17... >>> metrics.meanAveragePrecision 0.35... >>> metrics.meanAveragePrecisionAt(1) 0.3333333333333333... >>> metrics.meanAveragePrecisionAt(2) 0.25... >>> metrics.ndcgAt(3) 0.33... >>> metrics.ndcgAt(10) 0.48... >>> metrics.recallAt(1) 0.06... >>> metrics.recallAt(5) 0.35... >>> metrics.recallAt(15) 0.66...
Methods
call(name, *a)
call
Call method of java_model
meanAveragePrecisionAt(k)
meanAveragePrecisionAt
Returns the mean average precision (MAP) at first k ranking of all the queries.
ndcgAt(k)
ndcgAt
Compute the average NDCG value of all the queries, truncated at ranking position k.
precisionAt(k)
precisionAt
Compute the average precision of all the queries, truncated at ranking position k.
recallAt(k)
recallAt
Compute the average recall of all the queries, truncated at ranking position k.
Attributes
meanAveragePrecision
Returns the mean average precision (MAP) of all the queries.
Methods Documentation
Returns the mean average precision (MAP) at first k ranking of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.
New in version 3.0.0.
Compute the average NDCG value of all the queries, truncated at ranking position k. The discounted cumulative gain at position k is computed as: sum,,i=1,,^k^ (2^{relevance of ‘’i’’th item}^ - 1) / log(i + 1), and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current implementation, the relevance value is binary. If a query has an empty ground truth set, zero will be used as NDCG together with a log warning.
If for a query, the ranking algorithm returns n (n < k) results, the precision value will be computed as #(relevant items retrieved) / k. This formula also applies when the size of the ground truth set is less than k.
If a query has an empty ground truth set, zero will be used as precision together with a log warning.
If for a query, the ranking algorithm returns n results, the recall value will be computed as #(relevant items retrieved) / #(ground truth set). This formula also applies when the size of the ground truth set is less than k.
If a query has an empty ground truth set, zero will be used as recall together with a log warning.
Attributes Documentation
Returns the mean average precision (MAP) of all the queries. If a query has an empty ground truth set, the average precision will be zero and a log warning is generated.