# Classification Accuracy Assessment

### Learning objectives of this topic

- Be able to assess the accuracy of a classification through the confusion matrix
- Understand what the terms overall accuracy/error, producer’s and user’s accuracy, and error of omission and commission, are referring to, and be able to calculate them for different classification classes.

**Introduction**

In an accuracy assessment, the map is compared with validation data. Based on the sampling protocol for the collected validation data, a confusion or error matrix can be constructed. This confusion matrix is a cross-tabulation of the class labels allocated by map and validation data. Usually, the map classes are represented in rows and the reference classes in columns.

The confusion matrix allows for the calculation of the following accuracy metrics (Congalton, 1991):

- Overall accuracy & Overall error
- Producerâ€™s accuracy
- Userâ€™s accuracy
- Errors of omission
- Errors of commission

Table 1 gives an example of a confusion matrix for a classification with 3 classes (water, forest and urban). We will use this matrix to illustrate how to calculate the various accuracy metrics.

Classified \ Reference | Water | Forest | Urban | Total |

Water | 25 | 5 | 0 | 30 |

Forest | 7 | 32 | 2 | 41 |

Urban | 6 | 3 | 26 | 35 |

Total | 38 | 40 | 28 | 106 |

Note that this calculation is valid for validation data collected using random sampling, i.e., if sample inclusion probability is the same for each validation location. In case of unequal inclusion probability, i.e., equal number of sample sites selected regardless of the stratum area in stratified sampling, different accuracy estimation formulas are used. Accordingly, confidence intervals for the accuracy metrics are calculated differently, but this is outside the scope of this module. Section 4.2 in Olofsson et al. (2014) includes good practice recommendations when assessing map accuracy.

**Overall accuracy and overall error**

Overall accuracy (OA) tells us the proportion of validation sites that were classified correctly. The diagonal of the matrix contains the correctly classified sites. To calculate the overall accuracy you sum up the number of correctly classified sites and divide it by the total number of reference sites.

{Overall\, Accuracy} = \frac {(25 +32+26)} {106} = 0.79

Overall error represents the proportion of validation sites that were classified incorrectly. This is thus the complement of the overall accuracy (accuracy + error = 100%). So you can calculate the overall error from the overall accuracy, or you add the number of incorrectly classified sites and divide it by the total number of reference sites.

{Overall\, Error} = 1 - {Overall\, Accuracy} = 1-0.78 =0.22

{Overall\, Error} = \frac{(7+6+5+3+0+2)}{106} = 0.22

In our example the overall accuracy is 78% and the overall error is 22%.

### Producer’s accuracy

Producerâ€™s accuracy (PA), also known as sensitivity (in statistics) and recall (in machine learning), is a measure for how often real features on the ground are correctly shown on the classified map. So it is the map accuracy from the point of view of the mapmaker (producer). It is calculated per class by dividing the number of correctly classified reference sites by the total number of reference sites for that class (= column totals).

PA\, water = \frac{25}{38} = 0.66

**Userâ€™s accuracy**

Userâ€™s accuracy (UA), also known (in statistics and machine learning) as precision, is a measure of how often the class on the map will actually be present on the ground. So it is the map accuracy from the point of view of the map user. It is calculated per class by dividing the number of correct classifications by the total number of classified sites for that class (= row totals).

UA\, water = \frac{25}{30} = 0.83

**Error of omission**

The error of omission is the proportion of reference sites that were left out (or omitted) from the correct class in the classified map. This is also sometimes referred to as a Type II error or false negative rate. The omission error is complementary to the producerâ€™s accuracy, but can also be calculated for each class by dividing the incorrectly classified reference sites by the total number of reference sites for that class.

Omission\,error\,water = \frac{7+6}{38} = 0.34 \\OR\\ 1-PA\,water = 1-0.66=0.34

**Error of commission**

The error of commission is the proportion of classified sites that were assigned (or committed) to the incorrect class in the classified map. This is also sometimes referred to as a Type I error or false discovery rate. The commission error is complementary to the userâ€™s accuracy, but can also be calculated for each class by dividing the incorrectly classified sites by the total number of classified sites for that class.

Commission\,error\,water = \frac{5+0}{30} = 0.17 \\OR\\ 1-UA\,water = 1-0.83=0.17

**Sources & further reading**

**Congalton, R.G. **(1991). A review of assessing the accuracy of classifications of remotely sensed data. *Remote sensing of environment*, * 37*(1), pp. 35-46.

### Learning objectives of this topic

- Be able to assess the accuracy of a classification through the confusion matrix
- Understand what the terms overall accuracy/error, producer’s and user’s accuracy, and error of omission and commission, are referring to, and be able to calculate them for different classification classes.

**Introduction**

In an accuracy assessment, the map is compared with validation data. Based on the sampling protocol for the collected validation data, a confusion or error matrix can be constructed. This confusion matrix is a cross-tabulation of the class labels allocated by map and validation data. Usually the map classes are represented in rows and the reference classes in columns.

The confusion matrix allows for the calculation of the following accuracy metrics (Congalton, 1991):

- Overall accuracy & Overall error
- Producerâ€™s accuracy
- Userâ€™s accuracy
- Errors of omission
- Errors of commission

Table 1 gives an example of a confusion matrix for a classification with 3 classes (water, forest and urban). We will use this matrix to illustrate how to calculate the various accuracy metrics.

Classified \ Reference | Water | Forest | Urban | Total |

Water | 25 | 5 | 0 | 30 |

Forest | 7 | 32 | 2 | 41 |

Urban | 6 | 3 | 26 | 35 |

Total | 38 | 40 | 28 | 106 |

Note that this calculation is valid for validation data collected using random sampling, i.e., if sample inclusion probability is the same for each validation location. In case of unequal inclusion probability, i.e., equal number of sample sites selected regardless of the stratum area in stratified sampling, different accuracy estimation formulas are used. Accordingly, confidence intervals for the accuracy metrics are calculated differently, but this is outside the scope of this module. Section 4.2 in Olofsson et al. (2014) includes good practice recommendations when assessing map accuracy.

**Overall accuracy and overall error**

Overall accuracy (OA) tells us the proportion of validation sites that were classified correctly. The diagonal of the matrix contains the correctly classified sites. To calculate the overall accuracy you sum up the number of correctly classified sites and divide it by the total number of reference sites.

{Overall\, Accuracy} = \frac {(25 +32+26)} {106} = 0.79

Overall error represents the proportion of validation sites that were classified incorrectly. This is thus the complement of the overall accuracy (accuracy + error = 100%). So you can calculate the overall error from the overall accuracy, or you add the number of incorrectly classified sites and divide it by the total number of reference sites.

{Overall\, Error} = 1 - {Overall\, Accuracy} = 1-0.78 =0.22

{Overall\, Error} = \frac{(7+6+5+3+0+2)}{106} = 0.22

In our example the overall accuracy is 78% and the overall error is 22%.

### Producer’s accuracy

Producerâ€™s accuracy (PA), also known as sensitivity (in statistics) and recall (in machine learning), is a measure for how often real features on the ground are correctly shown on the classified map. So it is the map accuracy from the point of view of the mapmaker (producer). It is calculated per class by dividing the number of correctly classified reference sites by the total number of reference sites for that class (= column totals).

PA\, water = \frac{25}{38} = 0.66

**Userâ€™s accuracy**

Userâ€™s accuracy (UA), also known (in statistics and machine learning) as precision, is a measure of how often the class on the map will actually be present on the ground. So it is the map accuracy from the point of view of the map user. It is calculated per class by dividing the number of correct classifications by the total number of classified sites for that class (= row totals).

UA\, water = \frac{25}{30} = 0.83

**Error of omission**

The error of omission is the proportion of reference sites that were left out (or omitted) from the correct class in the classified map. This is also sometimes referred to as a Type II error or false negative rate. The omission error is complementary to the producerâ€™s accuracy, but can also be calculated for each class by dividing the incorrectly classified reference sites by the total number of reference sites for that class.

Omission\,error\,water = \frac{7+6}{38} = 0.34 \\OR\\ 1-PA\,water = 1-0.66=0.34

**Error of commission**

The error of commission is the proportion of classified sites that were assigned (or committed) to the incorrect class in the classified map. This is also sometimes referred to as a Type I error or false discovery rate. The commission error is complementary to the userâ€™s accuracy, but can also be calculated for each class by dividing the incorrectly classified sites by the total number of classified sites for that class.

Commission\,error\,water = \frac{5+0}{30} = 0.17 \\OR\\ 1-UA\,water = 1-0.83=0.17

**Sources & further reading**

**Congalton, R.G. **(1991). A review of assessing the accuracy of classifications of remotely sensed data. *Remote sensing of environment*, * 37*(1), pp. 35-46.