Tuesday, October 14, 2014

Unsupervised Classification - ISODATA

- Introduction -

Image 1: Screenshot of the ETM+ image subset of
Eau Claire and Chippewa counties in False Color IR
Extracting land use/land cover (LULC) information from remotely sensed imagery can be performed through multiple methods including: parametric and nonparametric statistics, supervised or unsupervised classification logic, hard or soft set classification logic, per-pixel or object-oriented classification logic, or a hybrid of the aforementioned methods. Unsupervised classification, using the Iterative Self-Organizing Data Analysis Technique (ISODATA) clustering algorithm, will be performed on a Landsat 7 ETM+ image of Eau Claire and Chippewa counties in Wisconsin captured on June 9, 2000 (Image 1). Minimal user input is required to preform unsupervised classification but extensive user interpretation is needed to convert the generated spectral clusters into meaningful informational classes. The conceptual framework of the ISODATA algorithm, along with its available user inputs, and LULC class interpretation will be discussed in this lab exercise.

- Background -

ISODATA is a modification of the k-means clustering algorithm in that it has rules for merging clusters, based on a user defined threshold, and splitting single clusters into two.
ISODATA is considered self-organizing because it requires little user input. The required input includes: a maximum number of clusters to be generated, a maximum number of iterations, a convergence threshold (to determine a percentage of pixel values that will remain unchanged between iterations), a maximum standard deviation (to determine cluster splitting), a minimum percentage of clusters (to determine cluster deletion and reassignment), a split separation value (used in cluster splitting), and a minimum distance between cluster means (to determine cluster merging). The algorithm begins by placing arbitrary cluster means evenly throughout a 2D parallelepiped based on the mean and standard deviation of each band used in the analysis. These cluster means are recalculated and shifted in feature space based on a minimum distance from mean classification rule through each iteration. Once the user defined convergence threshold has been reached, iterations cease and the resulting spectral clusters can then be interpreted.

Advantages of unsupervised classification is no extensive knowledge of the study area is required. Little user input is needed to perform unsupervised classification which minimizes the likelihood of human error. However, the analyst has little control of the classes generated and often these clusters contain multiple land covers making interpretation difficult.

- Methods -

ISODATA was performed in ERDAS IMAGINE 2013, by navigating to Raster > Unsupervised > Unsupervised Classification. In the Unsupervised Classification window, the input raster and output cluster layer were assigned, and the Isodata radio button was selected to activate the user input options. ISODATA was performed twice on the image. Once with a class range of 10 to 10 and again with a class range of 20 to 20. The max iterations was changed to 250 and all other inputs were kept at the default values, with the exception of a 0.92 convergence threshold for the ISODATA with 20 classes. Also, the Approximate True Color radio button was selected in the Color Scheme Options. A value of 250 was chosen for the max iterations to ensure the algorithm would run enough times to reach the convergence threshold, however, both ISODATA algorithms only had to cycle through seven iterations before this was accomplished.



Image 2: Comparison of the original image (left) and the
ISODATA classified image before recoding (right)
The resulting classified image (Image 2) was opened in a viewer and the generated clusters were recoded into thematic information classes by navigating to Table > Show Attributes. With the image attributes open, each cluster was selected one by one and its color was changed to gold making it easy to distinguish compared to the other approximate true colors generated by the algorithm. The classified image was synced with Google Earth historical images to determine which land cover is most associated with each cluster. Once a decision was made the color was changed to either green for forest, blue for water, red for urban/built up, pink for agriculture or sienna for bare soil and given the appropriate name (Image 3). The columns in the attribute window can be modified to allow for easier interpretation by navigating to File > View > View Raster Attributes > and selecting the Column Properties icon.  After all the clusters had been recoded, the attribute window was closed and in the pop-up window, Yes was selected to save the changes. The recoded image was saved by navigating to File > Save As > Top Layer As.


Image 3: Screenshot of the process of determining LULC classes from the ISODATA generated
clusters. Google Earth historical imagery on one monitor was synced to the ERDAS viewer on another monitor.

In order to make a map of the LULC classified image, the image classes needed to be recoded from 10/20 classes down to the 5 desired classes. This was done by navigating to Thematic > Recode. In the Recode window, the New Value field was modified for each record depending on the class name defined by the user earlier in the attribute window. Water was given a value of 1, forest - 2, agriculture - 3, urban/built up - 4, and bare soil - 5. The new values were then saved by selecting Apply in the Recode window and the image was saved by navigating to File > Save As > Top Layer As. Now, when the attribute window was opened, only 5 classes appear, rather than 10 or 20, representing the entire image. The images were then opened in ArcMap 10.2.2, symbolized appropriately, and represented as a map comparing the two ISODATA recoded images.

- Results -
Map 1: Comparison of the two ISODATA classifications

- Discussion -

Qualitative confidence-building assessment was performed by visual comparison between the classified LULC image and Google Earth historical imagery. Statistical confidence-building assessment will be performed in a subsequent lab. The ISODATA classification which generated 20 clusters (ISO[20]) was more accurate than the 10 cluster ISODATA (ISO[10]). Urban/built-up area is greatly overestimated in ISO[10] which is corrected in ISO[20]. However, agriculture is overestimated more in ISO[20]. The differences between urban/built-up, agriculture, and bare soil is caused by the spectral similarities of the features. Agricultural land ranged from healthy crops to fallow fields. Healthy crops had spectral similarities to forest while fallow fields had spectral similarities to bare soil. When determining informational classes from the ISODATA generated clusters, fallow fields were considered agriculture instead of bare soil. This distinction likely caused agriculture to be overestimated in ISO[20]. Some regions of water, mostly smaller rivers, were incorrectly classified agriculture or forest. This is most likely caused by vegetation overlap masking the reflectance of the water.


- Conclusion -

Overall, interpretation was difficult because there was extensive overlap between the 5 desired informational classes and the 10/20 generated clusters. Classes with overlap included: forest and healthy agriculture, bare soil and urban area, and fallow fields/sparse vegetation and bare soil. ISO[20] was more accurate because more clusters were generated which allowed for more specific spectral characteristics to be singled out and classified accordingly. Although ISO[20] was more accurate, it still overestimated agriculture lands and had erroneous classifications for urban/built-up and forested land. To increase the accuracy of the LULC map generated, supervised classification could be used and will be used in the next lab.


- Sources -

Earth Resources Observation and Science Center, United States Geological Survey. (2000) Landsat ETM+




No comments:

Post a Comment