The basic outline of our aproach is as follows:
STEP 0: Getting the database
We have a 1250 .pgm pictures of 20 people
and point files for every .pgm. Point file contains 20 coordinates
about important areas of the face, like the center of the eye, top of
the nose etc.
STEP 1: Separation of 20 people
We took each picture and created directories for
each person and we chose only 13 pictures about different people.
STEP 2: The Average maker
Read all the 13 pictures and their coordinate files
and cut (64/64) the left eye, the right eye, the nose and the lips. With
them, we make .pgm files which contain the center of the eyes, nose etc.
After that, we have to load them and make an average for each one. These
are patterns we try to detect in the input images.
With each average we do a contrast equalizatio.
This is very important because in this way we can make larger distances
between the gray levels, therefore we have more information about the
picture and also have a better result for matching.
STEP 3: Extracting key point coordinates from the database
We have 1020 point files about the separated 20
people and make a database containting on the first position the
identifier, and then, the other 20 positions with the coordinates.
STEP 4: Normalizing the database
To compensate for slight rotation of the face, we scale all key
points of a face such that the maximal x- and y- coordinate are 1,
and the minimal ones are -1. This data can also be learned by a
Neural Network more easily.
STEP 5: Matching points in input images
To locate key points in the face, we search
small 64x64 patterns. Our Matlab implementation is based on normalized
correlation of small image rectangles in the image space (this
statistical similarity measure for image information compensates for
illumination changes). A position of maximal correlation between the
input image and the pattern is considered as a match.
Image-space based correlation computation has
been used in stereo vision extensively and is very cost-intensive. A
speedup from 1,5 minutes to 6 seconds per feature could be achieved
using a hierarchical matching approach, where correlation is first
computed at lower resolution levels, and the results are refined for
elvels of higher resolution.
For an evaluation, see XXX.
For future improvements of our localization
procedure, see YYY.
STEP 6: Classifier Input
We also validated that key point are suitable
features for face recognition. Therefore, we extracted the coordinates
of 20 key points for each image and concatenated them to a
40-dimensional vector. The coordinates were further normalized (see STEP
4)
STEP 7: Classification
As a classification, it emerged that a simple
1-Nearest-Neighbor approach was absolutely sufficient and performed
satisfyingly in experimental results (see ZZZ).
We also implemented a Multilayer Perceptron
approach in Matbab, but were not able to test it due to time pressure.
Overall results:
Experimental results for the classification part were succesful and
proved that key points are an excellent feature for robust face
recognition.
Problems remain concerning the localization of key points. Work
remains to be done to make the approach more reliable and enhance
performance (see XXX).
|