The Receiver Operating Characteristic (ROC) Curve is a way of measuring the performance of the model. The common practice is to look at the Area under the curve (AUC). For logistic regression, the curve can be constructed by changing the threshold of the predicted probability for classification. The ROC curve measures the performance of each threshold. The x-axis of the curve is the false positive rate (FPR, or fall-out) and the y-axis is the true positive rate (TPR, or recall). The intuition is that for every 1 unit increase of false positive rate, how much true positive rate do we gain? The rationale is that as we decrease the threshold, trying to increase the true positive (TP) count, the false positive (FP) count also increases at the same time.
The slope of a point on the ROC curve \[=\frac{TPR}{FPR} = \frac{\frac{TP}{P}}{\frac{FP}{N}}\]
For a model that has no predicting power (no better than random), the slope of the ROC curve would be 45 degree (slope = 1):
\[ \begin{aligned} \frac{TPR}{FPR} &= \frac{TP}{P} \cdot \frac{N}{FP} \\ &= \frac{TP}{FP} \cdot \frac{N}{P} \\ & =1 \end{aligned} \]
since that if the model is random, then the ratio of \(\frac{TP}{FP}\) would be the same as \(\frac{P}{N}\)
Use true value as the denominator, where \(D\) is true default and \(\mathcal{P}(D)\) is the probability of default, \(ND\) is true non-default, \(E\) is the predicted default and \(NE\) is the predicted non-default.
Use predictive value as the denominator:
The \(P\) is defined as the real positive cases in the data and \(N\) is defined as the real negative cases in the data.
Above ratios can also be linked through Baye’s Rule:
\[
\begin{aligned}
\mathcal{P}(D|E) &= \frac{\mathcal{P}(D, E)}{\mathcal{P}(E)} \\
&= \frac{\mathcal{P}(E|D)\cdot \mathcal{P}(D)}{\mathcal{P}(E)} \\
&= \frac{\mathcal{P}(E|D)\cdot \mathcal{P}(D)}{\mathcal{P}(E,D) +\mathcal{P}(E,ND)} \\
&= \frac{\mathcal{P}(E|D)\cdot \mathcal{P}(D)}{\mathcal{P}(E|D) \cdot \mathcal{P}(D)+\mathcal{P}(E|ND)\cdot \mathcal{P}(ND)}
\end{aligned}
\]
Comparing \(\mathcal{P}(D|E)\) against \(\mathcal{P}(D)\), we can see how much improvement we have given the additional information from the model.
\[MCC = \frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]