To perform supervised classification, an analyst will collect samples areas of known land cover, commonly referred to as training samples, from the image which will be used to train a classifier. The classifier will then classify the entire image based on the information gathered from the collected training samples. Preferably, training samples would be collected based on prior knowledge of the area using the most accurate means available (GPS, topographic survey, etc.) However, for some applications this is impractical and high resolution imagery can be used to designate areas for training samples.
Important factors to consider when collecting training samples include: the number of training samples, the number of pixels, shape, location, and uniformity. In general, a minimum number of 50 training samples per informational class is required to produce an accurate classified LULC image. The specific number of training samples for individual informational classes may vary depending on the nature of the image (spectral diversity) and project (special emphasis or resource availability). Another factor that may fluctuate is the number of pixels. Generally, 10n pixels, where n is equal to the number of bands in the image, is required to provide enough spectral information for the training sample's informational class to be properly identified. The shape of training samples will normally be a derivation of a polygon. Training samples should be distributed throughout the entire image to account for spectral variability in the informational classes and be located within a uniform and homogeneous land cover. An important concept to keep in mind is the geographic signature extension problem where differences in spectral characteristics of the same informational class result form differences from a variety of factors like, soil moisture and type, water turbidity, and crop species. To reduce the errors that result from the geographic signature extension problem, training samples should cover all possible variations of the desired informational classes (e.g. for vegetation, collect forest and riparian vegetation).
Training samples need to be evaluated before they are used to train a classifier. The histograms of a training sample should not be primarily multimodal. Training samples with more Gaussian histograms will produce a more accurate classified LULC image. If a training sample exhibits mostly multimodal histograms it should be deleted and collected again. Spectral separability will indicate the best bands to use for an analysis based on the separation of the bands of the spectral signatures of the training samples. The larger the separation between spectral signatures for a particular band, the better that band will be for classifying different land covers. Spectral separablilty can be calculated using software like ERDAS IMAGINE and produces a list of the best bands and an average score. The maximum value of the average score is 2000 indicating excellent separation between classes. Values above 1900 indicate good separation and values below 1700 indicate poor separation. If the average score of the training samples is below 1700, an analyst should examine the training sample's spectral profiles and histograms to look for abnormalities and possibly collect more training samples. If the average score is satisfactory, the training sample can be used to train the classifying algorithm (classifier).
Advantages of supervised classification over unsupervised is the control the analyst has over informational classes produced and not having to interpret the spectral clusters generated by unsupervised methods. Also, by analyzing the quality of training samples, the classification can be improved before its actually performed. However, collection of proper training data can be time-consuming, expensive, and may not fully represent the desired informational classes leading to errors in classification.
Pixel-based supervised classification using a maximum likelihood classifier will be performed on the same Landsat 7 ETM+ image of Eau Claire and Chippewa Counties used in the pervious lab exercise where the unsupervised ISODATA method was performed. For this introduction to supervised classification, the size of training samples must be at least 10 pixels, the number of training samples for each informational class will be 15, and the training samples will be polygons collecting from the entire image taking the geographical signature extension problem and uniformity into consideration. The reference for determining training samples will be Google Earth historical imagery near the image collection date of June 9, 2000.
- Methods -
Figure 1: Example of how training samples were collected and recorded. |
Figure 2: The optimum bands for analysis are circled in red. The best average separability is circled in blue. |
Figure 3: The final 5 informational class signatures that were used to train the classifier. |
- Results -
Map 1: The classified LULC map created through pixel-based supervised classification using a maximum likelihood classifier. |
- Discussion -
Just like the last lab exercise where unsupervised classification was performed with the ISODATA method, qualitative confidence-building assessment was performed on the LULC map generated through pixel-based supervised classification. When compared to the classified LULC map generated through the unsupervised method, the supervised method resulted in a worse classification. The supervised classification method greatly overestimated urban/built-up land, erroneously classifying urban/built-up in areas of bare soil and sparse vegetation. The error in the classification was most likely due to the low number of training samples taken for each informational class and the quality of training samples derived from urban features. As stated in the introduction, a minimum of 50 training samples should be collected for every informational class. For this lab however, only 15 training samples were collected for each informational class. Almost every training sample collected for urban features displayed multimodal histograms no matter how many times these training samples were re-collected. No confidence is given to this map and the training samples would need to be modified if the classified LULC map was to be used for any subsequent analysis. Because urban/built-up is so grossly overestimated, determining the accuracy of the other informational classes is difficult by visual analysis. Once more and better urban/built-up training samples and possibly more agriculture and bare soil training samples were collected, the accuracy of forest and water could be better determined.
- Conclusion -
The supervised classification method did not produce a better classified LULC map compared to the output of the unsupervised ISODATA method like I had hoped. This was because of the low number of training samples taken for informational classes and the poor quality of urban/built-up training samples overall. Even though the supervised method resulted in a poorer LULC map, the results can be tweaked by modifying the current training samples and adding more training samples for informational classes that caused extensive errors. With a proper number of quality training samples, the pixel-base supervised classification method could produce a better quality LULC map. Statistical confidence-building assessments will be performed on both classified LULC maps generated through the unsupervised and supervised methods in the next lab exercise to quantify the difference in accuracy of the two methods.
- Sources -
Earth Resources Observation and Science Center, United States Geological Survey. (2000) Landsat ETM
No comments:
Post a Comment