Image Classification

The Overview and Try It sections provide a basic understanding of the process. Subsequent sections have more in–depth information and can be used as a reference. Step by step tutorial lessons (with cyan background) are included and can be done on their own.

Overview

The TNTmips® Automatic Image Feature Classification process automatically groups image cells with similar spectral properties into classes. This process uses the spectral pattern (or "color") of a raster cell in multispectral or multi-temporal imagery to automatically categorize all cells into spectral classes. The relationship between spectral classes and different surface materials or land cover types may be known beforehand or determined after classification by analysis of the spectral properties of each class.

The Classification process offers a variety of classification methods as well as tools to aid in the analysis of the classification results. Unsupervised methods automatically group image cells with similar spectral properties while supervised methods require you to identify sample class areas to train the process. After running the classification process, various statistics and analysis tools are available to help you study the class results and interactively merge similar classes.

Try It

input

A multiband raster layer from input rasters: Landsat 8 bands B2, B3, B4, B5, B6, B7, B10, and B11. Raster Layer Controls: bands selected for RGB component are B6, B5, B4 respectively, with Contrast set to Auto Normalize.

Exercise 1 - Load input and run

To try a quick run through of the process all you need to do is select the input, choose an unsupervised classification method, set the number of classes, and run it.

  1. select Image > Classify > Auto-Classify from the TNTmips menu bar
  2. click the New icon on the Automatic Image Feature Classification window and add all of the raster bands in the stanton_landsat8.rvc file (or least one band from your own multispectral data).
  3. set up how the newly added multiband raster layer is to be displayed via Raster Layer Controls (i.e. Red:B6, Green:B5, Blue:B4; with Contrast:Auto Normalize)
  4. choose KMeans from the Method menu
  5. set the Number of Classes to 10
  6. click the Run icon
  7. when prompted navigate to the folder you want to put the output rasters, create a new .rvc file, and accept the default object names for them.
  8. Make a note of the file path and name of the output .rvc file where you put the class and distance rasters. You can re-load the resulting class raster using the Open icon to use the same results in later exercises.

input

After running the Classification process, the Results raster is loaded and its class list is shown in the Classes section of the main process window. When the cursor is moved over the class raster in the View window the corresponding class is highlight in the Classes list – shown here with a light green background. (A datatip for the layers in the View window is also shown here with a yellow background.)

(continued from previous exercise)

Exercise 2 - Examine the resulting classes

    After the classification is finished running the resulting class raster is automatically loaded in the View window.

  1. Move your cursor over the class raster layer in the View and look at the corresponding class that is highlighted in the Classes list (in the main process window).

output

The class raster (middle) is shown along with the input layer (left) and distance raster (right). Notice the nearly solid darker blue pivots in the class raster indicate a single class was found for those areas. However, we can see there is more variation in the input composite image. Furthermore, we see both light and dark areas for those pivots in the distance raster. Recall that cells with poorer fit to their class have lighter tones.

(continued from previous exercise)

Exercise 3 - Compare classes to input layer and distance raster

Zoom in to an area of interest, then use the Show/Hide checkbox for layers in Sidebar to compare the input rasters with the resulting class and distance rasters.

    The following steps are done in the Classification View window.

  1. manually add the distance raster you created in the previous exercise (via Add Layer)
  2. The default name of the distance raster begins with "DST" followed by the Method name. If the distance raster displays too dark, set the layer's Contrast to Auto Normalize (via Raster Layer Controls).

  3. zoom in on a lighter area of the distance raster (i.e. indicating cell values are far from class center)
  4. study the area of interest in all three layers (input, class raster, and distance raster) by hiding and showing them
  5. look for a class that likely includes more than one ground feature (i.e. a class with distinct differences in the input, or lighter areas in the distance raster)

Tip: if the value set for the Number of Classes is too small you may notice some classes contain more than one ground feature. However, a small number of classes is simpler to manage when first learning to use the tools to analyze classes in later exercises.

Background

Many remote sensing systems record brightness values at different wavelengths that commonly include portions of the visible light spectrum as well as photoinfrared and middle infrared bands. The brightness values for each of these bands are typically stored in a separate grayscale image (raster). Each cell in a multiband image therefore has a set of brightness values which in effect represent the "color" of that patch of the ground surface. (Here we extend our concept of color to include wavelengths beyond the visible light range.)

spectral space - An N-dimensional space where N is the number of input rasters with each on their own coordinate axis.

spectral pattern - Coordinates of a point in spectral space. In other words, the values taken from all the input rasters for a single raster cell. Consider viewing an RGB layer made up of three bands. The red, green, and blue values for a single image cell are used when displaying it in color. In spectral space, these values make up the spectral pattern for a point.

spectral pattern

Location of a single spectral pattern in a three-band spectral space.

The spectral pattern of a cell in a multispectral image can be quantified by plotting the raster values from each band on a separate coordinate axis to locate a point in a hypothetical spectral space. Most classification methods use some measure of the distance between points in this spectral space to assess the similarity of spectral patterns. Cells that are close together in spectral space have similar spectral properties and have a high likelihood of being the same surface feature.

Input

Select multiple rasters covering the same ground area for the Classification process input such as multispectral bands, hyperspectral bands, or multi-temporal data. A single raster is also allowed. The input rasters ...

  • must be accurately georeferenced;
  • must be co-aligned;
  • do not need to be the same raster data type nor do values need to be in the same units;
  • should contain cell values with a continuous range such as spectral data (or even surface, temperature, and elevation data); however, thematic or categorical rasters with discrete cell values such as soil class or land use should not be used;
  • may be in separate files (i.e. .tif) or separate objects in .rvc file(s);
  • may be a composite raster (i.e. multiple bands in one file/object);
  • may be a mix of separate files/objects and composite rasters;
  • will automatically be added to the View — usually as a single multiband raster layer

Prepare Input Data

  • If you have .tif files or other non-rvc file data, you may need to set the null value. You can do this in the Import process via the Null Value setting. Turn on Link to files in original form and location if you don't want to duplicate the raster data.
  • Co-align rasters if input rasters are georeferenced but don't have the same cell size or extents.
  • Apply radiometric correction to satellite imagery prior to classification in order to calibrate cell values to radiance or reflectance values.
  • If you have a mix of separate and composite rasters, optionally use Raster Convert Color process to put bands in separate RVC objects (input: Single, output: RGB Separate). This lets you more easily control how the layer is displayed in the View.

Tip: you do not need to scale the input ranges before using them. You can adjust the cell value ranges directly in the Classification process if needed.

Ground Truth Information

Ground truth data containing any available information about the types of materials and ground cover in the scene is useful but not required. It could be a map with hand drawn feature areas or a raster with high enough resolution that you can recognize features. Any type of ground truth data that is georeferenced can be manually added to the View as a reference layer. Such a reference layer may be used in both unsupervised and supervised classification to compare the resulting class raster to ground truth information.

Ground truth data can also be used to create the required training set raster for use in supervised classification. Vector polygons or points containing class attributes can be imported directly to a training set raster. Reference layers can be added to the View and used to manually create a training set raster.

Mask

A binary raster can be used to eliminate areas for processing and/or to set areas as null in the resulting class raster. A mask may be useful depending on your data but is not required. The Automatic Classification process is influenced by the brightness values of all cells in the scene, not just the features that you intend to classify. Thus using a mask to eliminate unwanted areas can vastly reduce the number of classes needed to differentiate materials in the area of interest.

The mask raster must be co-aligned with the input bands. Cell values of 1 in the binary raster will be processed while 0 values will be masked.

Results & Output

The primary classification result is a class raster, which is automatically displayed in a View window after running the Classification process. For clarity, we call this the Results raster in this document. Some classification methods also give you the option of creating a distance raster, which you can manually add to the View.

class raster - a categorical 8-bit unsigned raster where each (arbitrary) numerical value in the raster represents a class that is assigned to the cell by the classification process. The raster includes a color palette, used to display the layer, with a color assigned to each cell value.

results raster - the currently loaded class raster. After running the Classification process, the resulting class raster is automatically loaded and ready for analysis.

distance raster - a 32-bit floating point / grayscale raster that shows how well each cell fits its assigned class. Each raster cell value records the distance between that cell and its class center in spectral space. Cells that are closer to their class center (better fit) appear darker in the displayed raster than those with greater distance values (poorer fit).

In addition, statistics and class analysis data is produced along with tools to help you study the resulting classes: the Statistics, Dendogram, Confusion Matrix, Cooccurrence Matrix, Ellipse Scatterplot, and Distance Histogram windows can be saved to standard text or CAD file formats.

Interface

The Automatic Image Feature Classification dialog and the Classification View window are used to run the Classification process. This document refers to Automatic Image Feature Classification dialog as the 'main process' window and Classification View window as the View.

Automatic Image Feature Classification window

Top Toolbar

output

The top toolbar of the main process window.

The left-most options let you setup and run the automated classification process:

New - Click the New icon to select a set of input rasters to be classified.

Open - Reopen a previously made class raster to automatically load it with input rasters and settings.

Run - Run the image classification to create a class raster.

Method - Click the drop down list to choose from a menu of classification methods. The list is ordered so that Unsupervised Methods are above all of the Supervised Methods. Note the Training toggle in the Classes section becomes active if a supervised method is selected.

The top right options (in top toolbar) open statistics and analysis tools for evaluating classes. Statistics and Classification Dendogram are available for both class results (Results mode) and training classes (Training mode). Confusion Matrix, Cooccurrence, Scatterplot, and Distance Histogram are available only after running the process (in Results mode). See the Analyze Classes section for details about these statistics and analysis tools.

General Settings

output

Look at the settings in each section of the main process window: Input, Parameters, and Classes sections.

Input - Selected input rasters are shown in a scrollable pane. Adjust, Range, and Scale columns can be used to adjust the input raster's cell value range. Min, Max, Above, and Below specify how to handle values outside of the adjusted range.

Name - Shows the name of the input raster.

Values - Shows the input raster cell value range.

Adjust - Set the method used to adjust input range: None, Linear, Logarithmic, Histogram Normalization. Linear rescales cell values to the full 8-bit range and Equalize spreads them out equally over it.

Range - Set the difference between highest and lowest adjusted cell values. If manually set, the Scale will be automatically adjusted.

Scale - Set the value to use to scale the input when Adjust: Linear is set. If manually set, the Range will be automatically adjusted.

Min % - Shows the percentage of values below the Min #.

Min # - Shows the lowest number in the range of adjusted values.

Below - Specify how to handle values below the Min #: Exclude, Limit, Extend.

Max % - Shows the percentage of values above the Max #.

Max # - Shows the highest number in the range of adjusted values.

Above - Specify how to handle values above the Max #: Exclude, Limit, Extend.

List As - Set the name shown in View.

Typically you can use the input without any scale adjustments. However, if you are mixing input from different sources you may have significantly different cell value ranges, which would result in some rasters having more influence on the classification than others. You can apply a scale to any input raster to make the cell value ranges more similar and thus remove any bias.

The Classification process plots input cell values and computes distance in Euclidean space and this is used to classify cells. Thus a difference of '1' between cell values in a raster with a wide range is given the same emphasis as a raster with a narrow range of cell values. For example, a floating point raster with a cell value range between 0 and 1 would have a distance less than 1 between all cell values. Thus there will be no difference between them in Euclidean space. You can correct for this by applying a scale to broaden the cell value range. To think of it another way, if you want to emphasize an input raster, spread out its cell value range relative to the other input rasters.

Mask - Select a binary raster to mask out areas of the scene you don't want processed for Analysis and/or Output. Cell values of 1 in the binary raster will be processed while 0 values will be masked. The mask raster must be co-aligned with the input bands.

Clear - Removes the mask raster.

Use for - Used if a Mask is selected. Choose Analysis and/or Output to specify the areas to process. Use the Analysis option to ensure the set of classes is determined using only the unmasked areas. This can significantly reduce the number of classes needed and still separate the features in the areas of interest. Use the Output to create classes only for your area of interest. In that case, the resulting class raster will have null areas matching the mask binary raster.

Sample for Analysis - Sets the classification to build classes from a subset of the input image cells before applying the classes to the entire image. Sample cells are selected at regular intervals throughout the image. Set the sampling intervals for both Rows and Columns. The default settings of 1 ensures all input cells are used to build classes. Increasing these values speeds processing for large images. For example, changing both intervals to 2 results in a sample set made up of one quarter of the image cells.

Parameters panel

The settings in the Parameters panel are dependent on the Method selected. See the Unsupervised Methods and Supervised Methods sections for information about each method and their parameters.

Classes panel

The classes and toolbar options shown in the Classes panel are dependent on the mode selected — Results mode or Training mode.

Results mode - Uses the resulting class raster and shows its class list below. This mode has a special toolbar with options you can use to modify the classes and save your changes.

Save Results - Use to save the class raster after modifying classes. It is available after running the process and overwrites the current class raster without prompting. Note, after saving you can no longer Undo merged/modified classes.

Save Results As - Saves the updated classes to new class raster.

Merge Selected Classes, Undo, and Settings - See the Merge Class section.

Training mode - Lets you set up and work with a training set raster required for supervised classification. Shows list of training classes and includes a toolbar with options to create and work with them. See the Training Set section for more information on the following options available in this mode:

New Training Data, Open Training Data, Save Training Data As, Edit Training Data,
Import, Open Class Table, Save Class Table, Add Class, Delete Selected Class,
Apply Cell Value Changes, and Reset Cell Values as Saved.

classes list - If Results mode is on, the classes created after running the process are shown. If Training mode is on, it shows the training classes.

selection box - Click the box on the left side of a class cell value to select or unselect it. The selected class(es) can be used in various ways. For example, the selected class is used along with the Select Area tool in the View to set up training areas in a training set raster. Selected classes are also highlighted in the analysis tool windows to help you study the resulting class raster. Furthermore, if two or more classes are selected, the Merge icon becomes available and a single selected class will be used to specify the class when you open the Distance Histogram.

# - This number indicates the cell value in the class raster.

color sample - Color used to display the class in the View window. Click the color sample to open controls to change it.

Name - The name of the class. Double-click the name to change it.

Cells - The number of image cells in the class.

% - Percentage of cells in that class.

Classification View window

See the Introduction to the Display Interface for more information about the typical features and tools in the View. Special tools for the Classification process are listed below.

Top Toolbar

output

The Classification View window has additional Select Class and Select Area tools in the top toolbar.

In addition to the normal features and tools, there are two tools you can use to create and modify a training set class raster. These tools are closely tied to the class or classes selected in the main process window (Training mode only).

Select Class - This is a shortcut to select and unselect classes and thus lets you avoid having to move your mouse between the View and main process windows. Simply click on a feature (raster cell) in the View that has a class assigned to it — this toggles the selection box for that class in the main process window.

Select Area - Lets you add training samples to the training set raster. Draw a polygon around a ground feature and then right-click to open a menu with commands to perform on the selected cell(s). See the Add Samples to a Training Set Raster section for more details on the following commands: Assign Free Cells, Assign All Cells, Release All Cells, Release Selected Cells, Select, Unselect, Invert Selection.

Unsupervised Classification

In unsupervised classification, TNTmips uses a set of rules to automatically find the desired number of naturally occurring spectral classes from the set of input rasters. The rules vary depending on the classification method you choose from the Method option menu. Unsupervised methods do not require training data.

An unsupervised classification assigns class numbers in the order in which the classes are created. Because the raster values have no other numerical significance, for display a unique color is assigned to each class from a standard color palette.

Note that unsupervised methods are listed first in the Method list followed by the supervised methods.

The following unsupervised classification methods are available:KMeans, Fuzzy C-Means, Minimum Distribution Angle, ISODATA, Self Organizing Neural Network, Adaptive Resonance. See the Unsupervised Methods and their Parameters section for details on each method.

Exercise 4 - Compare unsupervised methods

    Start where the previous lesson left off or load any input. [I.e. re-load the class raster resulting from the first exercise (via Open icon) to quickly set up input rasters.]

  1. choose an unsupervised method you haven't tried such as Minimum Distribution;
  2. Previous exercises used the KMeans unsupervised method.

  3. adjust the Number of Classes to improve results (i.e. set to 20 instead of 10)
  4. run and name output as usual
  5. repeat with different unsupervised methods and settings

Tip: when you perform an unsupervised classification, set the number of output classes to be several times greater than the number of land cover types that you hope to recognize. You can then use the available analysis tools to recognize and merge similar spectral classes.

Merge Classes

A typical workflow in an unsupervised classification involves creating more classes than you need, looking at the class statistics such as cooccurrence and separation, and then manually merging classes that you've determined are the same material. The next section has information on how to Analyze Classes. For now we will skip the analysis step and focus on steps to merge. The Classes section of the main process window provides a simple interface for interactively merging two or more classes.

Merge Selected Classes - Merges the (two or more) selected classes into one class.

Undo - Reverts last set of merged classes back to original classes. Use multiple times to undo all previous Merge Selected Classes operations.

Settings - for merging classes

Renumber Classes - When on, classes are renumbered to avoid gaps in the cell value numbers (#).

Mix Colors when Merge - When on, a merged class is given a new color sample created by mixing the original class color samples. Otherwise one of the selected class colors is used.

Merge and Undo options.

illustration

Two classes are selected (yellow background) making the Merge icon active. The Undo icon will be active after merging (gray arrow will turn red). The Classification Cooccurrence Analysis window show these two classes are close spectrally (low separation) and often close together spatially (high cooccurrence), thus making them good candidates for merging.

Exercise 5 - Merge classes

Get familiar with the interactive tools for merging classes. (This exercise skips the step of studying the classes to determine which, if any, need to be merged.)

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. select two (or more) classes (click to fill the box on the left side of class row in the Classes section of the main process window)
  2. click the Merge icon that becomes active when two or more classes are selected.
  3. select another set of classes and merge them as well
  4. click the Undo icon and notice the last set of merged classes have reverted back.
  5. click Undo again to go back to the original list of classes.
  6. experiment with different merge options — use the Settings icon to change the Renumber Classes and Mix Colors when Merge toggles before merging classes. Notice how the class color sample and class numbers change.
  7. click on a class color sample and choose a different color in the Select Colors dialog.
  8. double-click on a class name and rename it.
  9. after merging and modifying the class list, click Save Results As to save your changes in a new class raster.

Analyze Classes

After running the Classification process, the resulting class raster (Results raster) is automatically loaded. Or, add a previously created class raster to automatically reload all input bands and recalculate class statistics and analysis data. Either way the Results raster is ready for analysis. An array of tools are available to help determine if the classes sufficiently represent the ground materials and help understand how to proceed when they don't.

Identify surface material for classes

In an unsupervised classification, start by trying to identify what surface material(s) are associated with each class.

Find classes that are spectrally similar

If more than one ground feature is incorrectly contained in one class, re-run the process with increased Number of Classes, different methods, or different parameter settings to create more or different classes.

Tip: when you perform an unsupervised classification, set the number of output classes to be several times greater than the number of land cover types that you hope to recognize. You can then use the available analysis tools to recognize and merge similar spectral classes.

Find classes that occur next to each other spatially

Variations in spectral characteristics of a single feature can result in misclassified features. These variations may be due to differences in plant size, the density of the leaf canopy, soil type and conditions, slope direction, and other factors. This variability is inherent in most of the land cover types that you may try to recognize in air photos or satellite imagery. If needed, merge classes as described in the previous Merge Classes section.

The initial analysis step may include studying the Results raster along with supporting layers in the View window. Along with that, the Classes section of the main dialog includes a brief count of the number of Cells and % for each class. However, the set of analysis tools can help you study the classes in much more depth. They are location at the top of the main process window on the right side of the toolbar. Note that each analysis window described below can be saved as a text or CAD file.

Statistics

The Statistics icon opens a Classification output Statistics window that displays class statistics. This information enables you to investigate the spectral properties of each class and lets you compare classes for possible merging.

Tabulated statistics include:

Class Counts - Number of cells and percentages for each class.

Class Means - Mean cell values of each class for every input raster.

Class Standard Deviation - Standard Deviation of each class for every input raster.

Class Distances between Means - Distance between Means for each possible class pairing.

Covariance matrix - Covariance Matrix for each class gives a relative measure of the degree of spectral correlation for each possible pairing of input rasters. High positive covariance values indicate a strong positive correlation for the pair of rasters, values close to zero indicate little correlation, and negative covariance values indicate a negative correlation between the two rasters.

illustration

Class Means and Class Standard Deviations sections of the Classification Output Statistics window

Exercise 6 - Look at class statistics

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. click the Statistics icon (in the top toolbar of the main process window)
  2. look at statistics for a few classes

Classification Dendogram

The Classification Dendogram is a branching, tree-like plot that shows the degree of spatial relatedness of the output classes. Class pairs that join together near the left edge of the diagram are closely related in their spectral properties and are thus good candidates for merging.

The dendogram process performs a successive grouping of pairs of classes, beginning with the pair having the closest class centers in spectral space as defined by the input bands. As each pair of classes is merged, a new joint class center is computed and class-center distances are recalculated. This process repeats until all classes have been merged into a single class. Results are plotted with the horizontal Separation axis representing distance in spectral space with the degree of relatedness decreasing to the right. The vertical lines joining two classes are plotted at the distance that separated the corresponding class centers before the classes were combined.

Options

The Class Size option lets you show class size as a percentage of cells or as cell counts (shown next to class name). The View menu provides options to zoom the x-axis scale on the dendrogram. Zoom In, Zoom Out and Full View let you zoom in and out on the graph or display it fully. The Save As CAD option lets you save the dendrogram as a CAD object.

The Separation option lets you choose a separability measure. The default is Euclidean, which shows the Euclidean distance between class centers in the feature space. Euclidean distance thus only depends upon the mean values of the classes. The other three choices (Bhattacharyya, Jeffries-Matusita, and Transformed Divergence) are statistical measures of the separation between classes that consider not just the class means but also the spread of values around the means. More specifically, they are computed for a pair of classes from the mean values and the covariance matrices. The Bhattacharyya distance increases continuously as the class means become farther apart in feature space, even beyond the point where there is no overlap between their distributions. The Jeffries-Matusita and Transformed Divergence measures have fixed lower and upper bounds, varying between 0 (classes have complete overlap) and 2 (no overlap). Both of these measures are also negative exponential functions of the distance between classes, so more weight is given to the difference between means for nearby classes.

separation - Distance in spectral space. Lower numbers mean classes are closer to each other.

illustration

The two classes with the lowest separation are selected in the Classification Results Dendogram window. The horizontal Separation axis shows a measure of the distance between class centers in spectral space.

Exercise 7 - Analyze related classes using the dendogram and View window

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. click the Classification Dendogram icon
  2. click on a class name and notice it is simultaneously selected in the main process window
  3. use the Classes list in the main process window or the Dendogram window to select and unselect classes to get a feel for it — then unselect them all
  4. select two classes with a small separation number in the dendogram (left-most linked brackets)
  5. look in the View window for those two classes (hover cursor over a class in the View to highlight that class in the class list); determine if they occur near each other spatially
  6. Often classes you want to merge are near each other both spatially (as seen in the View) and spectrally (as seen in the dendogram).

  7. right-click in the Dendogram window to combine the classes
  8. Alternatively, use the Merge icon in the main process window to combine the classes.

    Notice the dendogram is updated along with the list of classes. Remember to manually save the updated class raster if you want to keep your changes.

Confusion Matrix (Error Matrix)

The Confusion Matrix tool is used to compare the current class raster to another class raster in order to evaluate classification accuracy. You might compare the Results raster to the training set raster used to create it, another ground truth raster, or even to another class raster made using different methods or settings. For example, it is often used to assess the results of a supervised classification where the class of each sample area cell is compared to the class assignment produced by the supervised classification.

current class raster / result raster - Results of a classification run or a class raster re-loaded (via Open icon). It is automatically loaded in the Confusion Matrix. The current raster classes are shown when in Results mode in Classes section of the main dialog window.

Compare to: Training Set - The training set raster that was used to run a supervised classification. After running a supervised classification, the Confusion Matrix is automatically set up to compare the classification results with the training set raster used.

Compare to: Raster - Select another class raster such as a ground truth raster or another previously made class raster.

Tip: This analysis tool requires both rasters have matching classes. This is automatically true when you are comparing the resulting class raster of a supervised classification to the training set raster used.

illustration

The soybeans and alfalfa classes are selected in the main process window (Results mode) and thus are highlighted in yellow. We can say that 100% of cells found to be soybeans (cell value: 2) in the class raster were also assigned the soybeans class in the training set raster (checkset1). However, only 47.7% of cells found to be alfalfa (cell value: 5) in class raster were marked as alfalfa in the training set raster.

The matrix is organized with a column for each class in the Results class raster (cell value shown); likewise, there is a row for every class in the Training set or Raster (class name shown) you are comparing it to. This makes a bin for each possible pairing. The bins hold the class counts for any of the raster pixels that are classified in both class rasters. For each raster pixel, the class(es) assigned in both rasters are found, and the corresponding matrix bin is incremented by one. A glance at the matrix reveals the matching class pixel counts (gray diagonal) and non-matching pixel counts (non-gray bins) — in other words, correctly classified cells versus incorrectly classified cells. In other words, the gray background bins hold the number of correctly classified pixels in each class. Likewise, the values in off diagonal matrix cells represent misclassified (or differently classified) pixels.

In addition, total counts are computed for each column and row. The total count lets you compare the number of times a class is correctly predicted (gray diagonal) to the total number pixels in that class — the difference is the number of times it is misclassified.

Total (row and column) - Total sample cell counts for both columns (output raster classes) and rows in the Compare to raster are figured.

In the above illustration, we are looking at what happens to the cells in the training set raster (rows) after you run the classification. Note, class raster totals are limited to the cells that have assigned classes in both rasters.

The bin values and totals are also shown as percentages for both the Results raster and the Compare to raster. In this way, two measures of accuracy are shown for each individual class: producer's and user's accuracy:

Agreement % (bottom row) / producer's accuracy - Values less than 100% indicate errors of omission (Results raster cells omitted from the Compare to raster class). This is figured as the percentage of the diagonal bin value over the column Total. Non-diagonal bins in that column hold counts for cells that should be in that class (i.e. should be added to the training set samples).

Accuracy values for each column (Results raster) indicate the percentage of cells of that class in the Results raster that were set to the same class in the Compare to raster.

% (right-most column) / user's accuracy - Values less than 100% indicate errors of commission (cells incorrectly included in the Compare to raster class). This is figured as the percentage of the diagonal bin value over the row Total. Non-diagonal bins in that row hold counts for cells that do not belong in that class (i.e. should be removed from the training set samples).

Accuracy values for each row (Compare to raster) indicate the percentage of cells in the Compare to raster class that had the same class in the Results raster.

Kappa - Measure of agreement corrected by chance. A negative Kappa means that there is less agreement than would be expected by chance. See also: Cohen's kappa in Wikipedia.

Tip: Look for low percentage values and follow that row or column to find high cell counts in a non-diagonal bins. Look at the two associated classes for possible modification in the training set. Follow it up to find the Results raster class — add samples to the training set for this class. Follow it to the left to find the Compare to class — study the samples areas assigned to this class in the training set to see if you should remove some cells from those samples.

Training Set Error Matrix - A confusion matrix created using the training set raster. Because the raster cells in training areas are used to train a supervised classifier, classification accuracy is usually higher for these sample cells than for other areas in the scene.

illustration

The cell classes in the Results class raster and training set raster are counted. 126 out of 168 (75%) sample cells assigned to the soybeans class sample in the training raster were also given that class in the Results raster.

Ground Truth Error Matrix - A confusion matrix created using the a separate ground truth raster that was not used for training. To get a better idea of the broader classification accuracy, you can use a second set of ground truth areas that were not used in the training set.

illustration

A separate ground truth raster (not used as training set) is shown here. 74 out of 160 (46.3%) sample cells assigned to soybeans class sample in the training raster were also given that class in the Results raster. We generally expect there to be less accuracy with a separate ground truth raster than the one used as the training set raster. However, that is not always the case.

Exercise 8 - Analyze classes in confusion matrix

Open the confusion matrix for the training set raster; do the same for a different ground truth raster.

    This exercise requires two ground truth rasters: one to be used as the training set and the other to check classification accuracy afterward. Set up for this exercise by running a supervised classification (i.e. Stepwise Linear with stanton_landsat8.rvc input and training.rvc/checkset1 as the training set).

  1. after running the classification, click the Confusion Matrix icon (in the top toolbar)
  2. The matrix opens showing a Results raster class in each row and a training set class in each column.

  3. select a class in the Classes list of the main process window to highlight the corresponding class in the matrix (yellow background)
  4. click the Raster button and select another ground truth raster that is not the training set raster (i.e. stanton_training.rvc/checkset2)
  5. toggle the Compare to setting between Training Set and Raster to compare the Results raster with both of them
  6. Note the Agreement % is generally higher with the training set than for the other ground truth raster.

  7. look in the % column in the matrix and find the class with the lowest value (i.e. soybeans row)
  8. in that row, find the non-diagonal bin (without gray background) with the highest value and find the Results raster class (class 5 / alfalfa)
  9. This means some soybean cells in the training set raster were classified as alfalfa in the Results class raster. You can verify there is some overlap in the spectral properties of soybeans and alfalfa via Dendogram and Scatterplot.

overall accuracy - Value calculated by dividing the total number of correctly classified raster cells (the sum of the leading diagonal values) by the total number of cells in the ground truth raster, and expressing the result as a percentage.

Keep in mind that the confusion matrix shows classification accuracy only relative to the set of classes that you provide. Low accuracy values for a particular class may indicate that the sample areas you used are not completely representative of the class, the class is not sufficiently different from other classes in its spectral properties, or your set of classes does not include all of the significant materials in the scene.

Cooccurrence

The Cooccurrence window is a matrix with bins holding both the spatial cooccurrence value and the spectral separability value for each pair of classes in the matrix cells. (Class names and sizes are listed on the horizontal and vertical axes. The Class Size option lets you show class size as a percentage of cells or as cell counts.)

cooccurrence (upper number) - The (raw or normalized) frequency with which cells of each class pair occur spatially adjacent to each other in the image. Higher numbers indicated classes frequently occur next to each other in the image.

The cooccurrence procedure analyzes the spatial associations of pairs of classes and the values shown allow you to judge which classes are spatially associated. These values are produced by comparing the raw frequencies of adjacency with the values expected from a random distribution of class cells, a calculation that removes bias related to differing class sizes. A positive value indicates that two classes are adjacent to each other more often than random chance would predict. A negative value indicates that two classes tend not to occur together. The cooccurrence value shown by default is the Normalized frequency, which adjusts the raw adjacency frequencies to remove the bias related to differing class sizes. Set the Cooccurrence option to Frequency to see the raw frequency values.

separation (lower number) - Distance in the n-dimensional feature space defined by the image bands. Lower numbers mean classes are closer to each other.

The separation procedure analyzes the degree of spectral distance of class pairs. This is the same as figured for the dendogram — see the Classification Dendogram section for related information including Separation options (Euclidean, Transformed Divergence, Jeffries-Matusita, and Bhattacharyya).

Classes with both high cooccurrence and low separation are good candidates for merging. Values are shown in color for the 10 Highest Cooccurrence values (red) and the 10 Lowest Separation values (blue). The corresponding sliders let you find the matrix cell with highest/lowest cooccurrence/separation values respectively. When a slider is changed, the display automatically scrolls to the corresponding class pair and outlines the matrix cell in the same color.

Highest Cooccurrence - Classes with high cooccurrence values are near each other spatially.

Lowest Separation - Classes with low separation values are near each other in spectral space.

illustration

Class 1 and Class 2 are frequently next to each other spatially (high cooccurrence) and are also close to each other in spectral space (low separation).

Exercise 9 - Analyze classes with coocurrence / separation matrix

The Classification Cooccurrence Analysis window is a quick and intuitive tool to analyze, select and merge classes. You can study both spatial and spectral relationships between the classes. The dialog provides shortcuts to merge classes — select/unselect classes by clicking on a matrix cell then merge selected classes via right-clicking on the matrix.

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. click on a couple of matrix cells to select associated classes; click again to unselect them
  2. set the Highest Cooccurrence slider to 1; then look for the dark red border drawn around the matrix cell to find the 1st highest cooccurrence in the matrix (notice the default color of red matches the setting name and selected cell border)
  3. look for areas having those two classes in the View and see if they tend to be near each other
  4. set the same slider to the second highest cooccurrence (slider value of 2); then find the corresponding cell in the matrix; compare the 1st and 2nd highest cooccurrence values in the matrix
  5. set the Lowest Separation slider to 1; then find the corresponding cell in the matrix with the 1st lowest separation (notice the default color is blue)
  6. open the Classification Dendogram and look at the branching for two classes with low separation
  7. Notice how the value on the Separation axis is where the vertical line joins the horizontal lines for the two classes.

  8. close the Dendogram window
  9. back in the cooccurrence window, click the matrix cell with either the highest cooccurrence or lowest separation
  10. Notice the yellow background and also that the two associated classes are selected in the main process window and other analysis windows.

  11. right-click on the same matrix cell to merge the two classes
  12. Alternatively, use the Merge icon in the main process window to combine the classes.

    In practice, you may look for classes to merge where the matrix cell indicates both high cooccurrence and low separation.

Classification Ellipse Scatterplot

The Ellipse Scatterplot window shows the distribution of classes in spectral space projected onto a 2D plane in N-dimensional spectral space. You determine which two bands to look at by assigning them to the X and Y axes of the graph — a 'slice' of N-dimensional space. The positions of class clusters in spectral space can provide important information about the identity of the materials in each class. For example, a scatterplot of photoinfrared versus red bands is useful for recognizing classes representing bare soil, vegetated areas, and water. In addition, highly overlapping classes in multiple bands may indicate similar materials. You can view several spectral planes at the same time by opening more than one Ellipse ScatterPlot window.

Each class is represented by a scatter of cell values and/or the ellipses surrounding them. Both ellipse and points are drawn in the class color. However, when plotted points are dense or overlapping, black dots represent multiple classes. Ellipses for any classes selected (in the Classes section of the main process window) are drawn with a dark gray fill color.

illustration

Classes for bands 5 and 6 shown as ellipses and a scatterplot. Selected classes have dark gray fill when viewed as ellipses.

Interface

Use the Percentage slider to specify how many points ellipses are drawn around. The greater the percentage value the more encompassing the ellipse is; smaller (less encompassing) ellipses are drawn when a smaller percentage value is set. Adjust the plot using various View options (via menu and toolbar) including: Full, Zoom In , Zoom Out, and Previous. The following tools are available as well: Zoom Box, Pan View, and Select.

Exercise 10 - Compare classes plotted in 2D spectral space

It is difficult to imagine classes in the full N-dimensional spectral space (where N is the number of bands), however, the Ellipse Scatterplot gives you a way to visualize class points by plotting them in two dimensions.

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. click the Scatterplot icon and make the Ellipse Scatterplot window larger (click-and-drag on corner)
  2. with the Select tool active, click on a point with overlapping ellipses
  3. The selection tool in the Ellipse Scatterplot window is not ideal — notice that all ellipses that overlapped the position you clicked on were selected along with their corresponding classes.

  4. use the selection boxes (near the color samples) in the Classes section of the main dialog to unselect all the classes
  5. select two classes near each other in spectral space
  6. Tip: use the Dendogram window to find the two classes with the lowest Separation value, which means they are closer to each other in spectral space than any other pair of classes. Note, it is easier to find related classes using the Dendogram or another analysis window. It is also easier to select classes there or via the main dialog.

  7. back in the Ellipse Scatterplot window, move the mouse cursor over the selected ellipse and click the + key to zoom in (- zooms out), and adjust the scroll bars if needed
  8. change the two input bands you want to plot on the X and Y axis until you see the best separation of the two ellipses
  9. change the two input bands you want to plot on the X and Y axis until you see a good separation of the two ellipses or points (i.e. bands 5 and 6)
  10. click the New icon to open another Ellipse Scatterplot, choose another set of bands (for X and Y), then compare the selected classes in both windows
  11. switch between display of plotted class points and/or ellipses around them by clicking on the Pixel and Ellipse options

illustration

Compare the Classification Ellipse ScatterPlot, which uses class colors, to the same bands plotted in the Image Band Correlation window where colors indicate histogram counts.

Distance Histogram

The Distance Histogram can help you assess the distribution of class points in spectral space for any class. It is derived from the distance raster that was automatically created for the current set of classes. Thus the graph plots the distance values of the last selected class. A compact class with a small spread of cell values has a narrow distance histogram. A diffuse class with many points far away from the class center has a histogram with a 'tail ' extending to higher distance values, or perhaps even a second mode (peak) in the histogram. These outlier cells may represent distinctly different materials than cells near the class center.

You can remove outliers from the class by clipping the histogram tail. Set the distance Threshold to split the histogram and automatically update the percentage of points to the Left and Right of it. Or click and drag crosshairs in the histogram graph to do the same. Apply your settings to clip the class value range and update the class raster accordingly. Discarded cells are assigned a 0 value in the class raster and will not be assigned to a class. Note that there is no way to undo this operation.

Interface

Choose the Class to graph (in the main process window) and then set the Method to compute the distance (Euclidean, Orthogonal, or Mahalanobis). Turn on Auto-Update to automatically recompute the distance histogram when you select a new class or manually Update it.

The graph has a File menu that lets you create a Snapshot of it or Save the histogram to a text file. The Display menu lets you choose how to display graph (Bar, Outline, or Strip). Set graph Options (for Grid, Logarithmic Scale, Cumulative, Show Transparent, Labels, and Label Size. The Minimum and Maximum histogram values are shown below the graph.

illustration

The class distance raster indicates how far class points are from the class center. So the 'tail ' on the right side of the histogram indicate points farthest away and may be removed from the class.

Exercise 11 - Remove outliers from a class via distance histogram

To try it out: select a class, click and drag crosshair position to set the distance Threshold, click Apply to discard points above the threshold, and check the class raster in the View for updated (null) areas.

    Start with a previous run of the process loaded so you see a class raster and classes listed. (I.e. load the class raster resulting from the first exercise.)

  1. optional step: use the Save Results As icon in the main process window to create and load a copy of Results raster
  2. It is important to know that any changes to the currently loaded Results raster are permanent. Thus saving a backup copy of the Results raster may be useful if you don't like your changes.

  3. click the Distance Histogram icon to open the dialog
  4. select a class in the main dialog window
  5. in the Distance Histogram window, click Update to update the graph (if Auto Update is not on)
  6. click and drag on the histogram graph to position crosshairs so that the number of cells on the Right right side of it is over 30%
  7. Notice that this sets the Threshold value.

  8. click Apply
  9. look in the View for updated areas in the class raster that are now null (but used to be the class you selected in step 2)

Tip: in this exercise we removed an overly large portion of the cells in the selected class, however, in practice you would only cut off outlier values indicated by the histogram 'tail'.

Supervised Classification

Supervised classification methods require detailed knowledge of a portion of the study area so that you can designate sample areas for each of the desired output classes. These sample areas are used to train the classification algorithm. The training set raster should incorporate as much of the spectral variability in the scene as possible. Supervised classification methods determine the statistical properties of each of the training classes, then use these properties to classify the entire image.

Most of the supervised classification methods assign every non-masked input cell to one of the designated classes. If you identify too few training classes, the resulting class raster may be made up of "super classes" that have different features placed in the same class.

training set raster - Identifies representative sample areas for each of the desired output classes.

mean values vector (mean vector) - Directional line segment in N-dimensional space consisting of the mean values taken from each coordinate axis (variable). Here, the mean value vector represents the class centers of the set of input rasters plotted in spectral space.

feature space - The spectral space defined by classes in the training set raster.

illustration

Ready to run a supervised classification with a training set raster loaded and the Training mode on.

The following supervised classification methods are available: Minimum Distance to Mean, Maximum Likelihood, Stepwise Linear, Suits Maximum Relative, Back Propagation Neural Network, Mahalanobis Distance to Mean. See the Supervised Methods and their Parameters section for details.

Exercise 12 - Run a supervised classification

Running a supervised classification has the additional step of selecting a training raster.

    The supervised classification exercises use data in stanton_landsat8.rvc for input and stanton_training.rvc for training and ground truth data.

  1. click the New icon on the main window and select all the rasters in the stanton_landsat8.rvc file.
  2. choose the Minimum Distance to Mean method
  3. in the Classes section of the main window click the Open Training Data icon and choose stanton_training.rvc, training_raster
  4. notice the Training toggle is on and thus the classes shown are for the training set raster and you have toolbar options available for working with it
  5. look at the training_raster layer in the View and compare the sample areas to features on the ground you can see in the input layer
  6. run and name output as usual
  7. compare class results with input and training set

Training Set

Before running a supervised classification, a training set raster must be set up. Either select an existing training set raster, import it from vector training data, or manually create one using training data as a reference. Pixel values for each class in the training set are used to determine the class in the Results class raster. This section discusses the Training mode of the Classes panel in the main process window, which has tools to create and work with a training set raster.

training data - Any type of information you may have about the ground cover or material in the study area that you want to use to 'train' supervised classification method being used.

training set raster (or layer) - Is a specialized class raster that contains class information for small portions of the study area. These sample classes are set up manually (or imported) for areas with known ground cover or material (i.e. ground truth information). As with other class rasters, each cell value in a training set raster represents a class. If classes are named, a database table is used to associate it with the correct cell value.

Tip: In regards to the loaded training set, pay attention to whether you are working with a temporary layer or a saved raster.

When you first create New Training Data, the training set layer that is added is temporary. It must be saved to a raster in an RVC file (via Save Training Data As) to be able to Run the process or re-use the training set data later. On the other hand, any previously saved training set raster that is loaded will automatically retain any changes you make to the class list or assigned sample areas.

When Training mode is on in the Classes section of the Classification process, the following toolbar options are available.

Use to manage the training set 'raster':

New Training Data - Creates a new temporary training set layer, which is initially a blank raster having no classes. You will need to set up classes by importing a class list or setting them up manually. You can then select a class and mark the sample areas in the View using the Select Area tool along with information you know about the scene. This training set raster will not be saved until you use Save Training Data As. After saving you can then Run the process.

Open Training Data - Open a previously saved training raster. It is automatically added to the View and any changes you make to the raster or classes are automatically saved.

Save Training Data As - Saves the currently loaded training set (either a temporary layer or saved raster) to a new raster object in an RVC file. Note, any temporary training set layer you make must be saved this way before you can Run the process. On the other hand, there is no need to save a training set raster after editing since it is saved automatically as you go.

Edit Training Data - Toggle mode to allow editing the currently loaded training set raster. When this button is on you can modify training set classes (name, cell value, and color). This mode is off by default after loading a saved training set raster. This is a safety measure since modifications are saved automatically and you cannot revert back to the original.

Tip: Instead of turning on the Edit Training Data mode, use Save Training Data As to create a new training set raster (with Edit Training Data mode on automatically). Then you can make changes without modifying the original training set raster.

Import from Vector - Imports polygon or point vector data to a training set raster and loads the resulting training raster.

Note, if you don't have a training set raster loaded, a new temporary training layer will be added. However, if you already have a training set raster loaded the imported training areas and classes will be merged with it. The Edit Training Data mode must be on if you want to import / merge into a loaded training set raster.

Use to manage class list:

Open Class Table - Creates a temporary training set layer with classes specified by the table. You can select any class table previously created via Save Class Table. If a training set raster is already loaded you will have the option to merge the new classes with the current class list (Keep current class list?).

Save Class Table - Save the current class list (class numbers, names, and associated colors) to a new table in an existing database. User can navigate into a class raster such as the current training set raster, a vector object such as point or polygon training data, or a main level database object.

This option lets you to make and use a consistent set of classes and colors for related datasets. This table can be reopened later in the Training Set Editor via Open Class Table.

Add Class - Add a new class to the current class list.

Delete Selected Class - Deletes classes that are currently selected via a filled selection box. This option is active when Edit Training Data toggle is on. Sample areas of that class will be removed from the raster as well unless you use that cell value in a new class.

Use to manage updated cell values assigned to classes (modifies both training set 'raster' and class list):

Apply Cell Value Changes - Updates the training raster to match the modified cell values in class list.

Tip: Turn on Edit Training Data mode to modify classes.

Reset Cell Value as Saved (Undo) - This reverts the class cell value(s) to the state prior to edits. In other words, the class list reverts to match the training raster as displayed in the View.

Training Mode and Training Set Statistics

When a supervised classification method is chosen, the Classes panel is automatically set to be in Training mode, which is then used to create, open, or import a training set raster. Once a training set raster is loaded the Statistics and Classification Dendogram toolbar options become active, which provide statistics for the training set classes. This information can be used to judge the spectral characteristics and separability of the training classes.

After running a supervised classification the Classes panel is automatically switched to Results mode and all of the analysis tools are available to study the Results raster. Furthermore, if you open the Confusion Matrix it will be automatically set up to compare the resulting classes to the training set.

illustration

Training set classes shown in statistics and dendogram windows. Wetland and deciduous forest are close to each other in spectral space according to the training set samples.

Exercise 13 - Training mode versus Results mode

    Set up by selecting a supervised classification method (i.e. Stepwise Linear) and loading input (stanton_landsat8.rvc) and training set (stanton_training.rvc, training_raster).

    For detailed setup instructions see the Run a supervised classification exercise.

  1. after loading input and selecting a supervised classification method, notice:
    • You are in Training mode (Training toggle is on in the Classes section).
  2. after loading a training set raster, notice:
    • Classes shown are for the training set raster,
    • toolbar options are available for working with training classes, and
    • Statistics and Classification Dendogram toolbar options are active (top toolbar).
  3. click the Run icon
  4. after running a supervised classification, notice:
    • You are in Results mode (Training mode is off but both toggles are available),
    • classes shown are for the new Results class raster,
    • toolbar options are available for working with Results raster classes,
    • all statistics/analysis tools are active (top toolbar), and
    • you can switch between Training and Results modes.
  5. toggle between modes while looking at the Classes section; note the changes in the available toolbar options and class list statistics
  6. also note when toggling modes that the analysis tools (in top toolbar) remain active, however, you should note that Statistics and Classification Dendogram are mode dependent
  7. toggle between modes and open Statistics for both; compare the Classification Output Statistics and Training Set Statistics windows
  8. open Classification Dendogram and switch between modes; notice the dendogram window changes between: Classification Training Dendogram and Classification Results Dendogram
  9. click the Confusion Matrix icon (top toolbar)
  10. See the Confusion Matrix (Error Matrix) section for more information on using it to analyze class accuracy.

Import Training Set Raster from Vector

Use the Import icon to convert vector polygons or points containing ground truth information (elements with attributes) to a training set raster. Objects to be used for importing training areas must be georeferenced, but need not match the extents of the input. The new training set raster is automatically co-aligned with the input rasters, and thus only transfers training areas in the overlap area.

illustration

Importing a new training set raster from vector polygons. Note, the vector is loaded in the View for reference when you click Apply. Then when you click OK, a temporary Training Set layer is added.

Exercise 14 - Import training set data from a vector

Create a training set raster by importing from a polygon or point vector.

    Close the process and re-open it to start fresh. Set up a supervised classification with input: stanton_landsat8.rvc file and method: Suits Maximum Relative. Ground truth objects are in stanton_training.rvc.

  1. in the Classes section, click Add Class and select the new class
  2. A new temporary training set raster is automatically created (unless one was previously loaded).

  3. click the Import from Vector icon
  4. in the Import Classification Training Data dialog, set the Source to stanton_training.rvc/training_polygons
  5. Notice the vector is shown in the View while you have the Source set in the Import Classification Training Data dialog.

  6. in the View select polygon(s) for one of the classes (i.e. via Select tool or using the crops table)
  7. back in the Import Classification Training Data dialog click Apply to import the selected polygon areas to the training set
  8. Note the updated Cells value for the selected class in the main dialog.

The above steps show how you can interactively import one class at a time. The next steps show you how to import many classes at once.

  1. in the Import Classification Training Data dialog, change the Class Name from All Same to crops.crop
  2. Note, the table specified is in the vector's polygon database. The table has records for each class containing a style color, the class name, the and the class number.

  3. click Apply or OK to import the classes and class samples to the training set
  4. Note that any previously set classes and samples in the training set may be retained as-is or overwritten when you import. That is why you still see the class you added manually in the first step of this exercise, however, the Cells value is now 0 because a polygon used in the last step covered the same area.

  5. click the Save Training Data As icon and name the new training raster object
  6. Notice the Run icon becomes available after saving the new training set raster.

  7. click Run

Try it with vector points: repeat the above steps except choose a point vector (stanton_training.rvc / training_points with crops.crop table and field); set Radius to 20; and click OK.

Note, if you already have a training set raster (or temporary layer) loaded, the new classes and class samples will be automatically merged with it. (You will need to turn on Edit Training Data to do so with a saved training set raster.)

Tip: use Apply to interactively import classes to the training set one at a time. Use OK to automatically close the dialog after importing.

Set Up the Class List for a Training Set Raster

If you don't already have a training set raster or importable ground truth data, you can create a training set raster manually within the Classification process. You will set up the class list and add samples using a reference layer such as a map with hand drawn ground truth information or a raster layer with high enough resolution that you can recognize features.

When creating a training set raster manually, you may decide to set up the class list first or as you add samples. Use the Add Class option to manually create the class list. Or import them if you have previously set up a class list for another training set raster or have named classes in a previously made class raster. Using a table to store class information is useful because it lets you use re-use the same class names, cell values, and color palette when running multiple supervised classifications for the same or related datasets.

You can import a class list from a table that is stored in a main level database object or under a class raster. Note, a class raster will only have a database table if classes were manually named. A class list table in a training set raster can be saved to a main level database object for later use.

Exercise 15 - Open a class list from table and modify classes

Open a previously saved class list. Learn how to modify and save it.

    Close the process and re-open it to start fresh.

  1. click the New icon on the main window and select input (stanton_landsat8.rvc) and set to supervised classification method (Suits Maximum Relative ).
  2. click the Open Class Table icon and choose a previously saved class table (stanton_training.rvc class_database object, CLASSINFO table)
  3. Notice a new, empty training set raster is automatically created and added as a layer in the View. This is a temporary raster but you will be prompted later to save it to an .rvc file.

    Also notice, the Edit Training Data icon is automatically turned on.

  4. click the Add Class icon
  5. rename the new class by double-clicking it (in the Name column)
  6. in the same way, change the cell value for the class (in # column)
  7. Note, you could have added all the classes this way instead of starting with the imported class list.

  8. click the Save Class Table icon and save it to a new table in the same database object (class_database)
  9. remove the class you just added: select it and click the Delete Selected Classes icon

With a class list and an empty training set raster loaded, you can start drawing samples as shown in the next lesson.

Tip: if you already have a class list loaded and use Open Class Table, you will be prompted to Keep existing class definitions? If you choose Yes the new classes will be merged with the previously loaded class set.

Add Samples to a Training Set Raster

The steps to add a sample to be used for training are: select a single class, draw the sample area, and choose an assignment option. In this way classes are assigned to each sample as you drawn them in the View.

A typical operation would involve first finding an area in a reference layer that is made up of a single class. Select that class in the usual way from the Classes list in the main process window. Then turn on the Select Area tool, which is at the right side of the top toolbar in the View. This activates the standard polygon drawing tool so you can draw a polygonal shape outlining the sample area. Finally, when the polygon is drawn, right-click on it and choose an assignment option from the menu that opens, thereby associating it with the selected class.

Assignment options include:

Assign Free Cells - All cells in the polygon that do not yet have a class assignment are added to the selected training class.

Assign All Cells - All cells in the polygon are added to the selected training class regardless of previous class assignment.

Release All Cells - All cells in the polygon are marked as unclassified.

Release Selected Cells - All cells of the selected class (or classes) become unclassified.

Select - Select class(es) by moving mouse over the class samples in the View. The drawn polygon limits classes that can be turned on.

Unselect - Unselect class(es) by moving mouse over the class samples in the View. The drawn polygon limits classes that can be turned off.

Invert Selection - Toggle whether class(es) are selected or not by moving mouse over the class samples in the View. The drawn polygon limits classes that can be toggled.

Note, once you have sample areas added to the ground truth raster, the Select Class icon in the top toolbar of the View can also be used to select a class.

Select Class and Select Area toolbar options.

illustration

Adding samples to the training set raster for the soybean class

(continued from previous exercise)

Exercise 16 - Create a new training set raster

Add class samples to a new training set raster.

    You can start where the previous exercise left off. Or, set up a supervised classification with a class list and empty training set raster (input: stanton_landsat8.rvc, method: Suits Maximum Relative, class list: training.rvc, class_database object, CLASSINFO table).

  1. add a reference (via Add Layer icon in the View) to help you determine known sample areas (i.e. CDL_2019_stat1.pdf)
  2. zoom in on an area you want to classify (i.e. center of the input image)
  3. in the main process window, select the class associated with the area of interest (i.e. soybeans)
  4. back in the View, click the Select Area icon (right side of top toolbar); then with a series of clicks, draw a polygon within one of the crop pivot areas
  5. when finished drawing, right-click and choose Assign All Cells from the menu that opens
  6. repeat the last two steps to add more samples for the same class
  7. unselect the class in the Classes list
  8. find an area with a different class; then select the corresponding class in the Classes list.
  9. Notice after adding samples for two classes, the Save Training Data As icon becomes available.

  10. click Save Training Data As and name the training set raster (the temporary training set layer becomes a permanent raster object)
  11. click Run

Tip: the Suit's Maximum Relative method only classifies raster cells that fit within the class criteria and thus the resulting class raster has null areas. Increase the Standard Deviation Multiplier to classify more raster cells.

See the Supervised Classification section for information about using a training set raster when running the Classification process.

Unsupervised Methods and their Parameters

See the Unsupervised Classification section for general information about using an unsupervised classification method. The following parameters are used in all unsupervised methods:

Number of Classes - sets an upper limit on the number of output classes. Increasing the output class limit also makes it more likely that similar cover types will be assigned to distinct classes rather than being lumped together in a single class.

Maximum Iterations - Sets an upper limit on the number of iterations performed in the class building phase of the process.

The following parameter is used with the KMeans, Fuzzy C-Means, Minimum Distribution Angle, Self-Organizing Neural Network, and Adaptive Resonance methods:

Initial Minimum Distance - sets the threshold distance in spectral space used to designate an input cell as a new class center instead of assigning it to the closest class. By adjusting this parameter downward, you increase the likelihood that different land cover types that are close together in spectral space will be assigned to distinct classes.

The following parameters are used with the KMeans, Minimum Distribution Angle, and Self-Organizing Neural Network methods:

Maximum Movement for Steadiness - A class center is considered to be steady when its movement with successive iterations falls below this value.

Minimum Steady Cluster Percentage - Sets the percentage of class centers that must become steady in order to accept the current set of classes.

KMeans

The KMeans method analyzes the input raster set to determine the location of initial class centers. In each process iteration, cells are assigned to the nearest class and new class centers are calculated. The new class center is the point that minimizes the sum of the squared distances between points in the class and the class center. With each iteration, class centers shift and the class assignments for some cells change. The process repeats until the shift in class centers falls below a specific value or the maximum number of iterations is reached.

Fuzzy C-Means

The Fuzzy C-Means method uses rules of fuzzy logic, which recognize that class boundaries may be imprecise or gradational. The Fuzzy C Means method creates an initial set of prototype classes, then determines a membership grade for each class for every cell. The grades are used to adjust the class assignments and calculate new class centers, and the process repeats until the iteration limit is reached.

The Fuzzy C-Means method is slower than other methods (such as KMeans) so for large datasets you may want to set the Sample for Analysis option to use a subset (sampling) of the input.

Minimum Distribution Angle

The Minimum Distribution Angle method uses an iterative approach to compute classes. The algorithm uses the set of values for each input cell to define a vector in feature space. The process analyzes the sample dataset to determine class centers, using the differing angles between sample vectors (distribution angles) as a measure of relatedness. Sample vectors separated by small distribution angles are assumed to be more closely related than those with larger distribution angles. The algorithm re-analyzes the angles using the results of the previous iteration to determine improved cluster centers. The process can create an optional distance raster.

ISODATA

The ISODATA method is similar to the K Means method but incorporates procedures for splitting, combining, and discarding trial classes in order to obtain an optimal set of output classes. The ISODATA method determines an initial set of trial class centers and assigns cells to the closest class center. In each subsequent iteration the process first evaluates the current set of classes. A large class may be split on the basis of its number of cells, its maximum standard deviation, or the average distance of class samples from the class center. A class that falls below a minimum cell count threshold is discarded, and its cells are assigned to other classes. Pairs of classes are combined if the distance between their class centers falls below a threshold value. After classes have been adjusted, new class centers are calculated and the process repeats. Process iterations continue until there is little change in class center positions or until the iteration limit is reached.

Minimum Cluster Cells - The Minimum Cluster Cells parameter sets the lower limit for the number of cells in a class. Any class with fewer cells is dissolved, and its cells are reassigned to other classes.

Maximum Standard Deviation - The Maximum Standard Deviation parameter provides one criterion for splitting large classes. If the class standard deviation for any input band exceeds this value, the class is split into two classes.

Minimum Distance to Combine - The Minimum Distance to Combine parameter sets the threshold distance used to determine if two nearby classes should be combined.

Minimum Distance for Chaining - The Minimum Distance for Chaining parameter applies to the initial creation of class centers. It sets the lower limit on the distance between two class means.

Self Organizing Neural Network

The Self Organizing Neural Network method is based on neural network computing techniques. Neural network learning is the process of adapting connection weights in response to sets of input values and resulting sets of output values. The Self Organization neural network is designed to recognize natural groups of spectral patterns in a sample of the input data, and to produce a consistent neural net output (class identification) in response to input of similar patterns during classification of the entire image.

The neural network used in the Self Organization process is a three-layer net in which the middle (hidden) layer consists of nodes arranged in a two-dimensional matrix. The initial values of connection weights between input nodes and hidden layer nodes are set randomly. As sample input data are fed to the neural network during the learning phase, connection weights are modified using a competitive learning strategy.

The set of raster values associated with a single cell in the sample input can be considered to be the coordinates of a position in feature space. These values are fed to each node in the hidden layer, where they are compared to the current set of weights for the node. The node with the closest match to the current position in feature space is determined on the basis of minimum Euclidean distance. The winning node and nodes in a surrounding local neighborhood have their weights updated to reduce the error in matching, while other nodes remain static. With successive iterations of the sample data, different neighborhoods in the hidden layer are trained to recognize specific classes of input pattern. Connections between the hidden and output layers are modified so that the net produces the same output if any of the nodes in a particular neighborhood is activated. The learning phase continues until the conditions established by the user-defined parameters are met. The trained neural net is then used to classify the entire input image. The process can create an optional distance raster.

Adaptive Resonance

The Adaptive Resonance method is based on neural network computing techniques that is designed to recognize natural groups of spectral patterns in the input data, and to produce the same neural net output (class identification) in response to input of similar patterns.

Neural network learning is the process of adapting connection weights in response to sets of sample input values and resulting sets of output values. The initial values of connection weights between input nodes and hidden layer nodes are set randomly. Like the Self Organizing method described above, the Adaptive Resonance method uses a competitive learning strategy to update connection weights.

Some competitive learning models do not produce satisfactory classification results with highly variable input. This can occur because training in response to later input patterns can undo the effects of training that occurred earlier in the learning phase. The Adaptive Resonance classification method uses a complex neural net architecture to ensure both stability and plasticity in response to widely varied training input. As in the Self-Organization method, the learning phase consists of multiple iterations of the sample data, which train different neighborhoods in the hidden layer to recognize specific classes of input pattern. The Adaptive Resonance algorithm includes tests to ensure that an existing neighborhood is only modified if the current input pattern is sufficiently similar to the average pattern for that neighborhood (Euclidean distance is used as the measure of similarity). If the current input vector passes this test, the closest matching node in the neighborhood is activated, its weights are updated to reduce the mismatch, and the average pattern for the neighborhood is updated. Otherwise, the closest matching node outside of existing neighborhoods is activated, and is used to form the nucleus of a new neighborhood (thus creating a new output class). The learning phase continues until the conditions set by the user-defined parameters are met. The trained neural net is then used to classify the entire input image. The process can create an optional distance raster.

Supervised Methods and their Parameters

See the Supervised Classification section for general information about using a supervised classification method. The following supervised classification methods are available:

Minimum Distance to Mean

The Minimum Distance to Mean method first analyzes the class areas designated in the training set raster, then calculates a mean value in each input raster for each training class and thus defines the class center in spectral space (mean values vector). The process then assigns each cell in the input raster set to the class with the closest class mean in spectral space.

The Minimum Distance to Mean algorithm is mathematically simple and efficient, but it does not recognize differences in the variance of classes, which has to do with their relative size in feature space. Thus, training sets with classes having different variances that lie close to each other in feature space may result in miss-classification of data points near the edge of a large class that may be closer to the center of a nearby smaller class than to their own class center. For this reason, the Minimum Distance to Mean method works best in applications where spectral classes are dispersed in feature space and have similar variance.

This method has no user-defined parameters. The user has the option to create a distance raster when setting output raster path and name.

Maximum Likelihood

The Maximum Likelihood method applies probability theory to the classification task and can be thought of as a refinement of the Minimum Distance to Mean method. So in addition to considering the distance from the class mean it also uses the relative size (variance) and shape (covariance) in spectral space. These statistics are then used to compute the probability that a given raster pixel belongs to a particular training set class. It computes all of the class probabilities for each raster cell and assigns the cell to the class with the highest probability value (maximum likelihood).

variance - A measure of how far a set of numbers is spread out. Specifically it is the average of the squared differences from the mean. Think of it here as the relative size of a training set class in feature space.

covariance - Indicates how two variables change together. Here a variable is the value of a coordinate along one axis in feature space. The covariance can provide information about the shape of the class in feature space.

mean values vector (mean vector) - Directional line segment in N-dimensional space consisting of the mean values taken from each coordinate axis (variable). Here, the mean value vector can be thought of as the class centers for a set of input rasters plotted in spectral space.

The probability that a pixel belongs to a particular class depends on the distance between it and the class center, and also on the variance and covariance of the class. The method interprets the cell values in each training set class as having a Gaussian (normal) distribution, which can be described by the mean vector and the covariance matrix.

Field for Apriori Probability - The probability values calculated by the Maximum Likelihood classifier in its default mode are based solely on spectral characteristics. But in some cases you may know independently that one class should be rare in the scene while another class should be very common. This prior knowledge could come from historical data (for example, records of the proportions of the area planted to different crops), or current information on similar areas.

A probability value based on such information is termed an a priori probability. The values can be percentages or between 0 and 1, but must be tabulated for each class in a single field in a database table attached to the training set raster. The a priori probability values are used as weighting coefficients in calculating class assignment probabilities.

Minimum Likelihood Percentage - this threshold lets you exclude cells that don't fit any of the training classes particularly well. If the highest class probability for a cell is smaller than this threshold, the cell is not classified, and is assigned a value of 0 in the class raster.

The Maximum Likelihood method produces more accurate class assignments than the Minimum Distance to Mean method when classes vary significantly in size and shape in spectral space. However, it is computation-intensive and thus processing time is relatively long. Without a priori values, this method assumes that each class has an equal probability of occurring in the scene and thus may not be valid for remote sensing applications.

This method has no user-defined parameters and can create an optional distance raster.

Stepwise Linear

The Stepwise Linear method uses linear discriminant analysis to define a new set of coordinate axes in feature space that most effectively differentiates classes, then projects input cell values into this coordinate system for classification.

discriminant functions - A set of derived variables that are linear combinations of the original variables. Each discriminant function can be visualized as a straight line in feature space.

linear discriminant analysis - A statistical technique that calculates a set of discriminant functions that separates classes.

variable - Refers to an input raster in feature space.

The Stepwise Linear method analyzes the training set and chooses the set of discriminant functions that produces the best possible separation (discrimination) between the classes in the training data. Discriminant functions are chosen using a stepwise procedure that selectively adds and removes input bands to find the minimum number of bands necessary to produce the optimal separation of training classes. In each step, an additional variable is selected for possible inclusion in the discriminant function and must meet certain requirements. Existing variables are also evaluated using removal criteria. Successive discriminant functions are constrained to be mutually perpendicular. This stepwise selection of variables terminates when no more variables meet the entry or removal criteria. Thus input rasters that do not add significantly to the discrimination of classes are eliminated from the classification process.

This method is particularly appropriate when you have a large number of input rasters as it minimizes the number of necessary bands. Also note, the possibility of assigning cells to the wrong class is minimized when variables are normally distributed within each class, and the covariance matrices are equal.

The user has the option to create a distance raster when setting output raster path and name.

Suits Maximum Relative

The Suits' Maximum Relative algorithm computes the composite brightness and the ratio of each band's brightness over the composite brightness. The algorithm calculates the mean and standard deviation of each of these parameters for the training class and uses them to define the class boundaries.

composite brightness - Sum of each individual band's values.

The algorithm calculates the composite brightness and band brightness ratio for each input cell, and assigns the cell to a class by comparison with the class boundaries defined from the training set. Unlike other methods, this method does not assign every non-masked input cell to a class. In contrast, it creates rigid class boundaries and leaves cells located outside the boundaries unclassified.

Standard Deviation Multiplier - This value scales the size of the class assignment partitions. A value greater than or equal to zero with precision to four decimal places can be entered; the default value is 2.0000.

This method has a computational speed advantage over other methods. Cells falling outside the boundaries of any class are left unclassified.

The user has the option to create a distance raster when setting output raster path and name.

Back Propagation Neural Network

The Back Propagation Neural Network method is based on neural network computing techniques. Learning in neural network theory is the process of adapting connection weights in response to sets of input values and resulting sets of output values. Back propagation is a specific learning algorithm by which a multilayer neural network can be trained to recognize and classify spectral patterns as it processes the training set data. The goal is to adjust the network parameters so that it correctly classifies patterns from outside the training set as well

The set of raster values associated with a single cell in a training area can be considered to be mathematical components of an input vector. Each of these raster values is fed to a single node in the input layer of the neural network. The output layer of the network contains one node per class; the set of values produced by the output layer nodes in response to an input vector constitute an output vector. Learning via back propagation involves comparison of the output vector with the desired classification result, or target vector. The target vector has a value of 1 for the node corresponding to the correct training class, and 0 for all other nodes. If the output vector does not match the target vector, connection weights are adjusted to reduce the difference. The network is initialized by setting random weight values, or by using an additional learning algorithm to estimate initial weights. As the set of input vectors from a training set are fed repeatedly to the network, the back propagation algorithm adjusts the weights in each pass to minimize the squared error (difference between output and target values) over the training set. Iterations continue until the error in output values falls below a user-defined limit, or until the process reaches a specified number of iterations. Once the learning phase is complete, the trained neural network is used to process and classify the image.

Maximum Iterations - sets an upper limit on the number of iterations. The allowable range of values is 1 to 10000, with a default value of 10. The number of iterations needed to produce desirable results varies widely depending on the input. Changes made to the class centers may diminish to an insignificant amount after some number of iterations less than the specified maximum. For example, 95 percent of a center's changes may occur in the first twenty iterations and only three percent in the next twenty iterations. Increasing the Maximum Iterations parameter may improve classification results, at a cost of increased processing time.

Maximum Cumulative - The Maximum Cumulative parameter also controls the amount of the weight adjustment. The parameter is used to set the Maximum cumulative error allowed by the process.

Learning Rate - The Learning Rate parameter is a multiplying term used to control the adjustment of connection weights. The allowable range of values is 0.0000 to 1.0000; the default value is 0.9000. Increasing this value increases the rate of change in connection weights with each iteration, and thus increases the learning rate of the neural network.

Error Threshold - The Error Threshold parameter sets the error limit for terminating the training of the neural network classifier. When the current classification error rate of the neural network falls below the value set by this parameter (or the training process reaches the maximum allowable number of iterations), training is terminated. The allowable range of values is 0.0000 to 1.0000; the default value is 0.0100 (corresponding to 1% misclassification of training data).

Mahalanobis Distance to Mean

The Mahalanobis Distance to Mean method is based on neural network computing techniques. Learning in neural network theory is the process of adapting connection weights in response to sets of input values and resulting sets of output values. Mahalanobis is a specific learning algorithm by which a multilayer neural network can be trained to recognize and classify spectral patterns as it processes the training set data. The goal is to adjust the network parameters so that it correctly classifies patterns from outside the training set as well.