We present a scalable hybrid CPU-GPU implementation of image stitching that processes large image sets at near interactive rates. Our implementation scales well with both image sizes and the number of CPU cores and GPU cards in a machine. It processes a grid of 42 x 59 tiles into a 17K x 22K pixels image in 43 s (end-to-end execution times) when using one NVIDIA Tesla C2070 card and two Intel Xeon E-5620 quad-core CPUs, and in 29 s when using two Tesla C2070 cards and the same two CPUs. Figure 1 shows a stitched image from the 42 x 59 tiles. The implementation also composes and renders the composite image without saving it in 15 s.
This implementation takes advantage of coarse-grain parallelism. It organizes the computation into a pipeline architecture that spans CPU and GPU resources and overlaps computation with data motion (see Figure 2). In the first step of the implementation, translation of each tile is estimated. The implementation achieves a nearly 10x performance improvement over our optimized non-pipeline GPU implementation and demonstrates near-linear speedup when increasing CPU thread count and increasing number of GPUs.
The second step assembles the tiles into a mosaic based on the computed translations. However, these translations cannot be used directly as they form an over-constrained system of equations. Our approach resolves this constraint by considering the system as a graph and progressively coalescing strongly-connected components. We use the cross-correlation values as a measure of connection strength. This optimization minimizes the uncertainty and increases stitching accuracy
An open-source Fiji plugin is made available for free download as a jar or from a Fiji update site.
T. Blattner et. al., "A Hybrid CPU-GPU System for Stitching of Large Scale Optical Microscopy Images", 2014 International Conference on Parallel Processing, 2014 (download pdf)