绘制 PR 曲线 -- sklearn.metrics.precision_recall_curve




概述

计算不同概率阈值的精确召回对

注意:此实现仅限于二分类任务。


  1. 精度是 tp /(tp + fp)的比率,其中 tp 是真正例的数目,fp 是假正例的数目。从直觉上讲,精度是分类器不将负例样本标记为正例的能力
  2. 召回率是 tp /(tp + fn)的比率,其中 tp 是真正例的数目,fn 是假负例的数目。直观上,召回是分类器找到所有正例样本的能力

最后的精度和召回值分别为 1. 和 0,并且没有相应的阈值。这样可以确保图形从 y 轴开始。


PR 曲线的绘制过程

  • 把模型预测出来的不同概率值去重后从低到高排序
  • 依次将每个概率作为阈值来求出当前阈值下的 precision 和 recall, 最终得到一个不同阈值下的 precision 列表和 recall 列表
  • 将 recall 作为横坐标, precision 作为纵坐标, 绘制得到的曲线就是 PR 曲线


精确率和召回率可以从混淆矩阵中计算而来,precision = TP / (TP + FP), recall = TP / (TP + FN)。那么 P-R 曲线是怎么来的呢?


算法对样本进行分类时,一般都会有置信度,即表示该样本是正样本的概率,比如 99% 的概率认为样本A是正例,1% 的概率认为样本 B 是正例。通过选择合适的阈值,比如 50%,对样本进行划分,概率大于 50% 的就认为是正例,小于 50% 的就是负例。

通过置信度就可以对所有样本进行排序,再逐个样本的选择阈值,在该样本之前的都属于正例,该样本之后的都属于负例。每一个样本作为划分阈值时,都可以计算对应的 precision 和 recall,那么就可以以此绘制曲线。


接口

sklearn.metrics.precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight=None)


Compute precision-recall pairs for different probability thresholds.

Note: this implementation is restricted to the binary classification task.

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The last precision and recall values are 1. and 0. respectively and do not have a corresponding threshold. This ensures that the graph starts on the y axis.

The first precision and recall values are precision=class balance and recall=1.0 which corresponds to a classifier that always predicts the positive class.

Read more in the User Guide.


Parameters:
y_true: ndarray of shape (n_samples,)

True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given.

probas_pred: ndarray of shape (n_samples,)

Target scores, can either be probability estimates of the positive class, or non-thresholded measure of decisions (as returned by decision_function on some classifiers).

pos_label: int or str, default=None

The label of the positive class. When pos_label=None, if y_true is in {-1, 1} or {0, 1}, pos_label is set to 1, otherwise an error will be raised.

sample_weight: array-like of shape (n_samples,), default=None

Sample weights.

Returns:
precision: ndarray of shape (n_thresholds + 1,)

Precision values such that element i is the precision of predictions with score >= thresholds[i] and the last element is 1.

recall: ndarray of shape (n_thresholds + 1,)

Decreasing recall values such that element i is the recall of predictions with score >= thresholds[i] and the last element is 0.

thresholds: ndarray of shape (n_thresholds,)

Increasing thresholds on the decision function used to compute precision and recall where n_thresholds = len(np.unique(probas_pred)).



Examples


In [1]: import numpy as np

In [2]: from sklearn.metrics import precision_recall_curve

In [3]: y_true = np.array([0,0,1,1])

In [4]: y_scores = np.array([0.1, 0.4, 0.35, 0.8])

In [5]: precision, recall, thresholds = precision_recall_curve(y_true, y_scores)

In [6]: precision
Out[6]: array([0.66666667, 0.5       , 1.        , 1.        ])

In [7]: recall
Out[7]: array([1. , 0.5, 0.5, 0. ])

In [8]: thresholds
Out[8]: array([0.35, 0.4 , 0.8 ])


 precision = tp / (tp + fp)

recall = tp / (tp + fn)

                                              precision     recall
y_true               0     0     1     1
y_score              0.1   0.4   0.35  0.8

threshold (0.1)      1     1     1     1      2/4 = 0.5     2/2 = 1.0
threshold (0.35)     0     1     1     1      2/3 = 0.67    2/2 = 1.0
threshold (0.4)      0     1     0     1      1/2 = 0.5     1/2 = 0.5
threshold (0.8)      0     0     0     1      1/1 = 1.0     1/2 = 0.5
                                                    1.0           0.0    


Visualizations with Display Objects


In this example, we will construct display objects, ConfusionMatrixDisplayRocCurveDisplay, and PrecisionRecallDisplay directly from their respective metrics. This is an alternative to using their corresponding plot functions when a model’s predictions are already computed or expensive to compute. Note that this is advanced usage, and in general we recommend using their respective plot functions.


Load Data and train model


For this example, we load a blood transfusion service center data set from OpenML <https://www.openml.org/d/1464>. This is a binary classification problem where the target is whether an individual donated blood. Then the data is split into a train and test dataset and a logistic regression is fitted with the train dataset.

from sklearn.datasets import fetch_openml
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X, y = fetch_openml(data_id=1464, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

clf = make_pipeline(StandardScaler(), LogisticRegression(random_state=0))
clf.fit(X_train, y_train)




Create ConfusionMatrixDisplay


With the fitted model, we compute the predictions of the model on the test dataset. These predictions are used to compute the confustion matrix which is plotted with the ConfusionMatrixDisplay


from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

cm_display = ConfusionMatrixDisplay(cm).plot()





Create RocCurveDisplay


The roc curve requires either the probabilities or the non-thresholded decision values from the estimator. Since the logistic regression provides a decision function, we will use it to plot the roc curve:

from sklearn.metrics import roc_curve
from sklearn.metrics import RocCurveDisplay

y_score = clf.decision_function(X_test)

fpr, tpr, _ = roc_curve(y_test, y_score, pos_label=clf.classes_[1])
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()




Create PrecisionRecallDisplay


Similarly, the precision recall curve can be plotted using y_score from the prevision sections.

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import PrecisionRecallDisplay

prec, recall, _ = precision_recall_curve(y_test, y_score, pos_label=clf.classes_[1])
pr_display = PrecisionRecallDisplay(precision=prec, recall=recall).plot()



Combining the display objects into a single plot


The display objects store the computed values that were passed as arguments. This allows for the visualizations to be easliy combined using matplotlib’s API. In the following example, we place the displays next to each other in a row.

import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 8))

roc_display.plot(ax=ax1)
pr_display.plot(ax=ax2)
plt.show()



Precision-Recall


Example of Precision-Recall metric to evaluate classifier output quality.


Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. In information retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned.


The precision-recall curve shows the tradeoff between precision and recall for different threshold. A high area under the curve represents both high recall and high precision, where high precision relates to a low false positive rate, and high recall relates to a low false negative rate. High scores for both show that the classifier is returning accurate results (high precision), as well as returning a majority of all positive results (high recall).


A system with high recall but low precision returns many results, but most of its predicted labels are incorrect when compared to the training labels. A system with high precision but low recall is just the opposite, returning very few results, but most of its predicted labels are correct when compared to the training labels. An ideal system with high precision and high recall will return many results, with all results labeled correctly.


Precision (P) is defined as the number of true positives (Tp) over the number of true positives plus the number of false positives (Fp).

P = Tp / (Tp + Fp)


Recall (R) is defined as the number of true positives (Tp) over the number of true positives plus the number of false negatives (Fn).

R = Tp / (Tp + Fn)


These quantities are also related to the (F1) score, which is defined as the harmonic mean of precision and recall.

F1 = 2 * (P * R) / (P + R)


Note that the precision may not decrease with recall. The definition of precision (Tp / (Tp + Fp)) shows that lowering the threshold of a classifier may increase the denominator, by increasing the number of results returned. If the threshold was previously set too high, the new results may all be true positives, which will increase precision. If the previous threshold was about right or too low, further lowering the threshold will introduce false positives, decreasing precision.


Recall is defined as Tp / (Tp + Fn), where  does not depend on the classifier threshold. This means that lowering the classifier threshold may increase recall, by increasing the number of true positive results. It is also possible that lowering the threshold may leave recall unchanged, while the precision fluctuates.


The relationship between recall and precision can be observed in the stairstep area of the plot - at the edges of these steps a small change in the threshold considerably reduces precision, with only a minor gain in recall.


Average precision (AP) summarizes such a plot as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:


where Pn Pn and Rn Rn are the precision and recall at the nth threshold. A pair (Rk,Pk)(Rk, Pk) is referred to as an operating point.

AP and the trapezoidal area under the operating points (sklearn.metrics.auc) are common ways to summarize a precision-recall curve that lead to different results. Read more in the User Guide.

Precision-recall curves are typically used in binary classification to study the output of a classifier. In order to extend the precision-recall curve and average precision to multi-class or multi-label classification, it is necessary to binarize the output. One curve can be drawn per label, but one can also draw a precision-recall curve by considering each element of the label indicator matrix as a binary prediction (micro-averaging).


Note

See also sklearn.metrics.average_precision_score,

sklearn.metrics.recall_scoresklearn.metrics.precision_scoresklearn.metrics.f1_score


In binary classification settings


Dataset and model


We will use a Linear SVC classifier to differentiate two types of irises.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)

# Add noisy features
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)

# Limit to the two first classes, and split into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X[y < 2], y[y < 2], test_size=0.5, random_state=random_state
)

Linear SVC will expect each feature to have a similar range of values. Thus, we will first scale the data using a StandardScaler.

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

classifier = make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
classifier.fit(X_train, y_train)



Plot the Precision-Recall curve


To plot the precision-recall curve, you should use PrecisionRecallDisplay. Indeed, there is two methods available depending if you already computed the predictions of the classifier or not.

Let’s first plot the precision-recall curve without the classifier predictions. We use from_estimator that computes the predictions for us before plotting the curve.


from sklearn.metrics import PrecisionRecallDisplay

display = PrecisionRecallDisplay.from_estimator(
    classifier, X_test, y_test, name="LinearSVC"
)
_ = display.ax_.set_title("2-class Precision-Recall curve")



If we already got the estimated probabilities or scores for our model, then we can use from_predictions.

y_score = classifier.decision_function(X_test)

display = PrecisionRecallDisplay.from_predictions(y_test, y_score, name="LinearSVC")
_ = display.ax_.set_title("2-class Precision-Recall curve")



In multi-label settings


The precision-recall curve does not support the multi-label settings. However, one can decide how to handle this case. We show such an example below.


Create multi-label data, fit, and predict

We create a multi-label dataset, to illustrate the precision-recall in multi-label settings.

from sklearn.preprocessing import label_binarize

# Use label_binarize to be multi-label like settings
Y = label_binarize(y, classes=[0, 1, 2])
n_classes = Y.shape[1]

# Split into training and test
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=0.5, random_state=random_state
)

We use OneVsRestClassifier for multi-label prediction.

from sklearn.multiclass import OneVsRestClassifier

classifier = OneVsRestClassifier(
    make_pipeline(StandardScaler(), LinearSVC(random_state=random_state))
)
classifier.fit(X_train, Y_train)
y_score = classifier.decision_function(X_test)


The average precision score in multi-label settings


from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score

# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test[:, i], y_score[:, i])
    average_precision[i] = average_precision_score(Y_test[:, i], y_score[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(
    Y_test.ravel(), y_score.ravel()
)
average_precision["micro"] = average_precision_score(Y_test, y_score, average="micro")


Plot the micro-averaged Precision-Recall curve


display = PrecisionRecallDisplay(
    recall=recall["micro"],
    precision=precision["micro"],
    average_precision=average_precision["micro"],
)
display.plot()
_ = display.ax_.set_title("Micro-averaged over all classes")



Plot Precision-Recall curve for each class and iso-f1 curves


import matplotlib.pyplot as plt
from itertools import cycle

# setup plot details
colors = cycle(["navy", "turquoise", "darkorange", "cornflowerblue", "teal"])

_, ax = plt.subplots(figsize=(7, 8))

f_scores = np.linspace(0.2, 0.8, num=4)
lines, labels = [], []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    (l,) = plt.plot(x[y >= 0], y[y >= 0], color="gray", alpha=0.2)
    plt.annotate("f1={0:0.1f}".format(f_score), xy=(0.9, y[45] + 0.02))

display = PrecisionRecallDisplay(
    recall=recall["micro"],
    precision=precision["micro"],
    average_precision=average_precision["micro"],
)
display.plot(ax=ax, name="Micro-average precision-recall", color="gold")

for i, color in zip(range(n_classes), colors):
    display = PrecisionRecallDisplay(
        recall=recall[i],
        precision=precision[i],
        average_precision=average_precision[i],
    )
    display.plot(ax=ax, name=f"Precision-recall for class {i}", color=color)

# add the legend for the iso-f1 curves
handles, labels = display.ax_.get_legend_handles_labels()
handles.extend([l])
labels.extend(["iso-f1 curves"])
# set the legend and the axes
ax.set_xlim([0.0, 1.0])
ax.set_ylim([0.0, 1.05])
ax.legend(handles=handles, labels=labels, loc="best")
ax.set_title("Extension of Precision-Recall curve to multi-class")

plt.show()





reference


https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_display_object_visualization.html#sphx-glr-auto-examples-miscellaneous-plot-display-object-visualization-py