In our example, the Expected Accuracy turned out to be 50%, as will always be the case when either "rater" classifies each class with the same frequency in a binary classification (both Cats and Dogs contained 15 instances according to ground truth labels in our confusion matrix). The final step is to add all these values together, and finally divide again by the total number of instances, resulting in an Expected Accuracy of 0.5 ((8.5 + 6.5) / 30 = 0.5). 15 (7 + 8 = 15) instances were labeled as Dogs according to ground truth, and 13 (8 + 5 = 13) instances were classified as Dogs by the machine learning classifier. This is then done for the second class as well (and can be repeated for each additional class if there are more than 2). In our case, 15 (10 + 5 = 15) instances were labeled as Cats according to ground truth, and 17 (10 + 7 = 17) instances were classified as Cats by the machine learning classifier. The marginal frequency for a certain class by a certain "rater" is just the sum of all instances the "rater" indicated were that class. To calculate Expected Accuracy for our confusion matrix, first multiply the marginal frequency of Cats for one "rater" by the marginal frequency of Cats for the second "rater", and divide by the total number of instances. The Expected Accuracy is directly related to the number of instances of each class ( Cats and Dogs), along with the number of instances that the machine learning classifier agreed with the ground truth label. This value is defined as the accuracy that any random classifier would be expected to achieve based on the confusion matrix. For this confusion matrix, this would be 0.6 ((10 + 8) / 30 = 0.6).īefore we get to the equation for the kappa statistic, one more value is needed: the Expected Accuracy. To calculate Observed Accuracy, we simply add the number of instances that the machine learning classifier agreed with the ground truth label, and divide by the total number of instances. the number of instances that were labeled as Cats via ground truth and then classified as Cats by the machine learning classifier, or labeled as Dogs via ground truth and then classified as Dogs by the machine learning classifier. Observed Accuracy is simply the number of instances that were classified correctly throughout the entire confusion matrix, i.e. We can also see that the model classified 17 instances as Cats (10 + 7 = 17) and 13 instances as Dogs (5 + 8 = 13). According to the first column 15 were labeled as Cats (10 + 5 = 15), and according to the second column 15 were labeled as Dogs (7 + 8 = 15). Ultimately it doesn't matter which is which to compute the kappa statistic, but for clarity's sake lets say that the columns reflect ground truth and the rows reflect the machine learning classifier classifications.įrom the confusion matrix we can see there are 30 instances total (10 + 7 + 5 + 8 = 30). In supervised machine learning, one "rater" reflects ground truth (the actual values of each instance to be classified), obtained from labeled data, and the other "rater" is the machine learning classifier used to perform the classification. Regardless, columns correspond to one "rater" while rows correspond to another "rater". This doesn't always have to be the case the kappa statistic is often used as a measure of reliability between two human raters. Lets begin with a simple confusion matrix from a simple binary classification of Cats and Dogs:Īssume that a model was built using supervised machine learning on labeled data. Computation of Observed Accuracy and Expected Accuracy is integral to comprehension of the kappa statistic, and is most easily illustrated through use of a confusion matrix. In addition, it takes into account random chance (agreement with a random classifier), which generally means it is less misleading than simply using accuracy as a metric (an Observed Accuracy of 80% is a lot less impressive with an Expected Accuracy of 75% versus an Expected Accuracy of 50%). The kappa statistic is used not only to evaluate a single classifier, but also to evaluate classifiers amongst themselves. The Kappa statistic (or value) is a metric that compares an Observed Accuracy with an Expected Accuracy (random chance).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |