Feature Database View: User Interface and Formulas


Data View Interface

Data view is designed to browse feature values, download subsets of features, visualize thumbnail of colonies, and hyperlink feature information with image and lineage information.

Each column represents one image feature that can be filtered and sorted by its values. User can specify the range of colony id and/or frame by entering values in the edit boxes (as shown in the Figure 1). Features can be selected by clicking on the "Display Features Selection" button on the left side. After clicking on "Submit" button, data table will display on the right side.

Each row represents either one colony or one hexagonal partition of a colony. The hexagonal features can be downloaded after clicking on the "Hex" button and then on the "Colony Hex CSV" link.

All colony level features can be downloaded by clicking on the "Colony CSV" button. A downloaded subset of features is in compressed CSV (comma separated value) format.


Data View
Data View

Feature Extraction Formulas

Image features are extracted over segments (i.e., colonies) or hexagon tiles as partitions of each segment. Each hexagon tile has a side of length 31 pixels and an area of approximately 2500 pixels. If a hexagon tile overlaps with a colony border in such a way that the overlapping area is less than 80% of a hexagon area (1997 pixels) then the features are discarded in order to avoid any bias in computed feature values.

General notation

ROI = Region Of Interest in image. It defines the pixels over which features are extracted.
I(x, y) = intensity value at a pixel location (x,y).
N = pixel count for a given ROI. ROI can be either a colony or a hexagon tile in a colony.

Haralick feature notation

The Co-Occurrence matrix is computed for intensities reduced to eight distinct gray-levels (Ng=8). The matrix is symmetric (i.e., co-occurrence black-white is the same as co-occurrence white-black). The matrix is normalized by the total number co-occurrences to report probabilities p(i,j). The co-occurrence computation is parameterized by an offset in each dimension of the image (x and y).

p(i,j) = Gray Level co-occurrence matrix value (i,j) is the estimated probability of a co-occurrence of intensities I=i and I=j.

To account for texture variation in multiple directions, we compute 12 different GLCM, parameterized with the (x,y) offsets listed in the table below. The (x,y) offsets have been mapped to their polar coordinates (angle, distance) to illustrate that a separate GLCM is computed approximately every 15 degrees for a distance of 3 pixels. Based on a small scale study, we concluded that a distance of 3 pixels was the most appropriate for our biological images of stem cell colonies.

X Y Angle Distance
3 0 0.00 3.00
3 1 18.44 3.16
3 2 33.69 3.61
2 2 45.00 2.83
2 3 56.31 3.61
1 3 71.57 3.16
0 3 90.00 3.00
-1 3 108.43 3.16
-2 3 123.69 3.61
-2 2 135.00 2.83
-3 2 146.31 3.61
-3 1 161.56 3.16

For more information about the Haralick features, please refer to the initial paper [1] . Some of the texture features in this document have been inspired by formulas posted on the Murphy Lab web page [2] . Nonetheless, they have been modified to improve consistency of formulas and naming conventions.

Each texture feature computation is associated with one formula. The computation yields four values according to the following aggregation over 12 spatial directions:

  1. Maximum amplitude over the 12 directions.
  2. The direction of the maximum amplitude.
  3. The amplitude of the orthogonal direction to the maximum amplitude direction.
  4. The average amplitude over the 12 directions.

Intensity features

Sample Mean

Mean value of intensities values over the ROI.
Sample mean formula

Mode

Most frequent intensity value computed from a histogram with 256 bins. The mapping of intensities to 256 bins is performed by scaling all values with respect to min and max image intensities.
Mode formula

Median

The middle intensity value of sorted image intensities from the ROI.

Sample standard deviation

Standard deviation of intensity values in the ROI.
Sample standard deviation formula

Higher Central Moments (3rd to 6th moments respectively)

Skewness formula

Kurtosis formula

Hyperskewness formula

Hyperflatness formula

Entropy

The entropy is computed from values of an intensity histogram with 256 bins after they have been normalized to represent the frequency of occurrence. The mapping of intensities to 256 bins is performed by scaling all values with respect to min and max image intensities.
Entropy formula
where fi is the frequency of occurrence for the i-th bin.

Shape features

The shape features are computed on the binary mask of the image.

Area

Area of a ROI in pixels is the total pixel count of the ROI.

Distance from border

Distance from the border of a ROI. In the case of hexagon-based partition of a ROI and a border defined by all pixels contributing to its thickness, features reported for each hexagon are labeled by 0 if the hexagon is touching the border pixels else they are labeled by 1.

Perimeter

Perimeter is the sum of pixels containing neighbors from the ROI and from the image background.
Perimeter formula

Circularity

Circularity is the ratio of area over perimeter squared normalized by 4 pi .
Circularity formula

Centroid X & Y Coordinates

Centroid coordinates correspond to the center of mass of the ROI.
Centroid X formula
Centroid Y formula

Orientation

Orientation formula
where
Mu pq formula

Eccentricity

Eccentricity formula

Bounding Box

The bounding box of each ROI is computed, and 4 values are reported as follows: X and Y coordinates of the top left corner, width and height of the rectangle.

Aspect Ratio Bounding Box

Aspect Ratio BB formula

Extend Bounding Box

Extend bounding box reflects the similarity to "solid" or "Swiss-cheese" like shapes given a bounding box of the ROI.
Extend Bounding Box formula

Haralick Texture Features

Let us define the following variables used by Haralick texture features:
Mu X formula
Sigma X formula
Px+y formula
Px-y formula

Texture Contrast

Texture Contrast formula

Four values are computed over 12 spatial directions.

Texture Correlation

Texture Correlation formula

Four values are computed over 12 spatial directions.

Texture Homogeneity

Texture Homogeneity formula

Four values are computed over 12 spatial directions.

Texture Energy

Texture Energy formula

Four values are computed over 12 spatial directions.

Texture Variance

Texture Variance formula

Four values are computed over 12 spatial directions.

Texture Entropy

Texture Entropy formula

Four values are computed over 12 spatial directions.

Texture Inverse Difference Moment

Texture Inverse Difference Moment formula

Four values are computed over 12 spatial directions.

Texture Sum Average

Texture Sum Average formula

Four values are computed over 12 spatial directions.

Texture Sum Variance

Texture Sum Variance formula

Four values are computed over 12 spatial directions.

Texture Sum Entropy

Texture Sum Entropy formula

Four values are computed over 12 spatial directions.

Texture Difference Average

Texture Difference Average formula

Four values are computed over 12 spatial directions.

Texture Difference Variance

Texture Difference Variance formula

Four values are computed over 12 spatial directions.

Texture Difference Entropy

Texture Difference Entropy formula

Four values are computed over 12 spatial directions.

References

[1] ^ Haralick, R.M.; Shanmugam, K.; Dinstein, Its'Hak, "Textural Features for Image Classification," Systems, Man and Cybernetics, IEEE Transactions on , vol.SMC-3, no.6, pp.610,621, Nov. 1973
doi: 10.1109/TSMC.1973.4309314
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4309314&isnumber=4309300

[2] ^ Haralick texture features defined on the Murphy Lab web page at https://murphylab.web.cmu.edu/publications/boland/boland_node26.html