Project
Documentation
Our task
The goal of the project was to recognize a given key in an image with several keys in various positions.
Overview of the System
The algorithm used for the problem is composed of the following steps:
- segmentation of the input and reference image into objects
- rough alignment of the objects extracted
- fine registration of the roughly aligned obejcts
- comparison of the registered images of objects
All the steps were implemented in Matlab. The components of the algorithm are described below.
Otsu Thresholding in the Saturation Channel
Since we have reference and test images with homogeneous background we do the segmentation in the saturation channel of the color input image after converting it to the HSV color space. For the segmentation we use Otsu thresholding. Otsu considers the image histogram to consist of two Gaussian populations representing the object and the background. A threshold is selected to maximize the between-class separation on the basis of the class variances.
Labelling Connected Components
We assume that the keys in the image are not overlapping or touching. Then, we can use connected components labelling to separate the objects in the binary image from each other. Built-in Matlab function has been used for labeling. The image contains foreground pixels, which describe the keys, and background pixels. Label is assigned to each pixel in the image. These labels indicate which connected region does the pixel belong to. Each connected region corresponds to one of the objects in the image.
Clearing the Image through Statistical Approaches
It is possible to gain some information about labeled regions. It has been used for clearing small parts and noise, where particles (regions) smaller than a certain pixel number have been deleted.
Separating objects
After small components have been cleared, we make labelling once again. We make a
collection of binary images, placing each object to a separate image and cropping each image to the size of an object.
Rough Registration using Principal Component Analysis
A common method from statistics for analysing data is principal component analysis (PCA). In communication theory, it is known as the Karhunen-Loève transform. The aim is to find a set of M orthogonal vectors in data space that account for as much as possible of the data's variance. Projecting the data from their original N-dimensional space onto the M-dimensional subspace spanned by these vectors then performs a dimensionality reduction that often retains most of the intrinsic information in the data.
In our project we use PCA only to determine the main axes of the key, named principal line and secondary line. These lines are derived from the eigenvector of the key corresponding to the maximal eigenvalue. After the principal axis direction has been calculated, we rotate the image so as to make it vertical.
Finding the optimal projective transformation
After the binary mask of reference key and the cropped input key are roughly registered, a more precise registration should be done between them. Since the viewpoints of the reference and the input photo may differ, we should use a projective transformation for the matching. We measure the error of the registration with the number of mismatching pixels (object pixel in the reference mask but background pixel in the input mask, or vice versa). We should find the minimum of the error function while iterating through the parameters of the projective transform.
Fortunately, we have all the tools we need for this task in MATLAB:
- projective transformation of images (using imtrasform and maketform functions),
- xor operation on images (and the sum function for counting non-zero pixels in the resulting image) to measure the registration error,
- an optimization algorithm for searching the minimum value of the registration error (fminsearch function, which implements Nelder-Mead non-linear minimization)
So we start with the identical projective transformation and simply let the optimization algorithm to fine-tune the 8 parameters of the transformation while minimizing the error function. The optimization stops when we reach the maximum number of iterations (now it is set to 500) or when the registration error is stable for several iterations.
Key Recognition using Perceptional Image Hashing and feature points
Finally, after all the transformations have been made, we need to decide, whether the key on the image is the reference key, or not. To do so, we used Perceptional Image Hashing using feature points. Ready-made algorithm was found in internet.
The algorithm we used maps an image to a fixed length binary string. We compare the hash values obtained from two different images for closeness in Hamming distance. The algorithm employs an iterative feature detector to extract significant geometry preserving feature points. Probabilistic quantization is applied on the derived features to further enhance perceptual robustness. The algorithm can detect if two objects are similar even if one of images is distorted or rotated (if the rotation angle is small).
|