This work addresses the problem of automating an image ranking process for stem cell colonies. We automated the manual process in a novel way: instead of fitting off-the-shelf image features and colony ranks to prediction models, we defined a new feature set that uniquely characterizes the visual clues from images of the colonies and biological rules experts use to rank colonies from image data.
Our automation considers several factors: the inconsistency of manually assigned stem cell colony ranks; the type of image segmentation to detect stem cell colonies (manual and automated); the type of image feature set (off-the-shelf vs. custom designed); and an underlying relationship between input colony features and output stem cell colony ranks (linear and non-linear). The novelty of our work lies in automating stem cell colony ranking, while preserving the connection between visually perceived quality characteristics of stem cell colonies and image colony features combined with a computational prediction model. The main contribution of our work is in demonstrating the benefits of direct interpretation of biological rules to automation of stem cell colony ranking. We also outlined a method for establishing relationships between the commonly used Haralick features and our custom-designed features.
Figure 1 illustrates the comparison of manual and automated workflows for assigning ranking labels to stem cell colonies. The automated part leverages not only the manually assigned labels but also the biological rules for image feature creation.
Maintaining stem cell lines currently requires manual selection of colonies for passage based on inspection under the microscope. Even when expert biologists have defined and agreed upon a set of biological rules to rank stem cell quality, the selection process is inconsistent. One approach to increasing consistency is via computer-based automation. Automation is typically achieved by: (1) adapting off-the-shelf image feature software; (2) building and validating a model to predict colony ranks from image features; and (3) predicting ranks from image feature measurements during the actual ranking process.
There are several drawbacks to such an approach. The typical approach does not specifically include features that experts look for in high quality colonies. Furthermore, there is no visual connection between off-the-shelf image features and stem cell quality. Without a biological connection between specific features and specific image characteristics, it is not possible to understand if the feature set has been fully defined. Additionally, off-the-shelf features may contain extraneous features that are included in complex prediction models (e.g., random forests and decision tree models), which may affect the outcome of such models and the accuracy of automated colony rank prediction. Our objectives are to: (a) incorporate bio-rules and experts’ ranks into a computational prediction model; (b) compare the accuracy of a prediction model for off-the-shelf image features in linear and non-linear prediction models against the accuracy of features based on bio-rules in a prediction model; and (c) explore relations between custom-designed and off-the-shelf features.
The input for developing our automated process includes five biological rules and 481 phase contrast images of stem cell colonies imaged at 10x magnification and ranked by two experts. The five bio-rules were broken down and mapped into 16 image features extracted over image segments obtained via automated segmentation. For comparison to more traditional methods of automation, a second set of 45 off-the-shelf features were used: 12 Haralick features, 30 wavelet features, one each for area, perimeter, and circularity. Both feature sets served as input to a linear (logistic LASSO) and two non-linear (decision tree and random forest) models. All models were validated by a leave-one-out resampling technique. Examples of stem cell colony images and their rankings are shown in Figure 2 below:
We demonstrated the benefits of using biological rules to select image features for the automation of stem cell colony ranking. Based on our analyses of segmentation and prediction models, the accuracy of automation benefited from additional information presented as a set of biological rules. We concluded this based on the comparison of using our new feature sets with off-the-shelf features, including Haralick texture features and wavelet features. The analyses also suggested that the model we are looking for is not a strict linear model. The improvement we saw in our model was only seen in the non-linear (random forest and decision tree) models. Using these 2 models, we showed an improvement in the outcome of the model using a selected set of features over the outcome of either using only off-the-shelf features, or using a combination of all feature sets together. Finally, we attempted to establish relations between the custom-designed features and off-the-shelf features based on correlation, in order to minimize future labor investments into custom-development of image features.
Paper: Adele Peskin, Steve Lund, Ya-Shian Li-Baboud, Michael Halter, Anne Plant and Peter Bajcsy, Automated Ranking of Stem Cell Colonies By Translating Biological Rules to Computational Models", proceedings of ACM BCB 2015, September, 2014.(download pdf)