Precision: the proportion of true positives out of the total predicted positives.ĪUC-ROC Curve stands for Area Under Curve – Receiver Operating Characteristics Curve.
Specificity (True Negative Rate): the proportion of actual negatives that are correctly identified. Recall is the other terminology used for Sensitivity. Sensitivity (True Positive Rate): the proportion of actual positives that are correctly identified. True Positive: The actual positives correctly predicted as positive is True Positive.įalse Positive: Actual Negative, but predicted as Positive is False Positive.
The phenomenon of our interest is Fraud, as such, we will use the term Positive for actual fraudulent cases and Negative otherwise. – In a Fraud Detection Model, we are interested in predicting fraud. Positive: In Classification problems, the phenomenon of our interest is called Positive.Į.g. Let us understand a few of these model performance measures.
There are several other metrics like sensitivity (true positive rate), specificity (true negative rate), false positive, false negative, F1 score, precision, recall, and kappa derived from the Confusion Matrix. Sensitivity, Specificity, Precision, Recall For such a balanced dataset, we use Classification Accuracy as a model performance measure. In the dataset, the proportion of Up & Down classes is likely to be 50:50. Up: The closing price of a stock is more than its previous day closing price.ĭown: The closing price of a stock is equal to or below its previous day closing price. You have classified the outcome as Up & Down. You have collected the last 1 year data of NIFTY stocks. You are building a model to predict whether the next working day closing price will be above today’s closing or not. In such a scenario, we may preferably use model performance measures like AUC, precision, recall, specificity or F1 Score. Even without a model, if we classify all cases as No-Fraud, the model classification accuracy would be 99%. The dataset has 99% of the observations as No-Fraud and only 1% Fraud cases. Example 1Īssume we are building a machine learning model to predict fraud. In binary classification, if the number of positive samples is (approximately) equal to the negative samples, the dataset is balanced. The classification accuracy metric should be used only for balanced datasets. When to use the Classification Accuracy Metric? Classification Accuracy = (a + d) / (a + b + c + d).total number of observations = a + b + c + d.the number of cases wrongly classified = b + c.the number of observations correctly classified = a + d.Confusion Matrixįrom the above confusion matrix, we observe: The predicted and actual class data is represented in a matrix structure as shown below and it is called Confusion Matrix. To calculate the classification accuracy, you have to predict the class using the machine learning model and compare it with the actual class. Besides Classification Accuracy, other related popular model performance measures are sensitivity, specificity, precision, recall, and auc-roc curve.Ĭonfusion Matrix & Classification Accuracy Calculation It is specifically used to measure the performance of the classifier model built for unbalanced data. Classification Accuracy is defined as the number of cases correctly classified by a classifier model divided by the total number of cases.