Algorithm for detection of image consistent features using surf method
Table of contents: The Kazakh-American Free University Academic Journal №4 - 2012
Authors: Denissova Natalya, East Kazakhstan State Technical University in honor of D. Serikbayev, Kazakhstan
Tlebaldinova Aizhan, East Kazakhstan State Technical University in honor of D. Serikbayev, Kazakhstan
Many branches of engineering related to generation, processing,
storage and transmission of information are to a large extent focused now on
the development of systems in which information is of an image character. An
image that can be considered as a two-dimensional signal, is a much more
capacious storage medium than the usual one-dimensional (pertaining to time)
signal [1].
The number of systems that use computer
vision as their primary source of information increases every day, resulting
in a need for new algorithms for image processing and recognition. The goal of computer vision is to develop useful
conclusions about objects and scenes of the real world based
on the analysis of images
obtained with the help of
sensors [2]. The study of computer vision is
a scientific direction in the
area of artificial intelligence
and related technologies of scanning real objects,
their processing, and use of the received data for an automated
solution of applied problems. The start up of developments related to this direction, refers
to the 1950’s. The first real success
in this field was gained in the Cornell Aeronautical Laboratory in 1958-1960 in connection with the implementation on IBM-740 of a hardware version of the system of recognition of
simple visual images – Mark I Perceptron (designed by Frank Rosenblatt).
Nowadays there are a number of different approaches to the
implementation of computer vision systems from different variations on
comparison of a received image with a master one to building complex
three-dimensional models based on images. As the topic of object recognition is
excessive, this study considers algorithm for detecting image consistent features
using the speeded up robust features method (SURF).
The principle of the algorithm is as follows. For featuring a scene
and a reference object image using the SURF there are critical points and
unique descriptors for them. Comparing the set of descriptors one can
distinguish a reference object against the scene [3].
SURF solves two problems: search of critical image points and the
development of their descriptors that are invariant to scale and rotation.
The first phase, “Search for Critical Points” is carried out by
means of the Hessian matrix. The determinant of the Hessian matrix (Hessian)
reaches an extremum at the points of maximum variance in the brightness
gradient. It is good at detecting spots, angles and line borders. Hessian is
rotationally invariant, but not invariant to scale. Therefore, SURF uses
multi-scale filters to find Hessians.
For each key point the direction of maximum brightness change
(gradient) and scale, taken from the scale factor of the Hessian matrix are
estimated.
The gradient at a point is calculated using the Haar filters.
The second stage is the “Formation of Descriptors.” A descriptor is
a set of 64 (or 128) numbers for each key point. These numbers reflect the
fluctuations of a gradient around a key point. As a key point is the maximum of
the Hessian, this ensures that around a key point should be areas with
different gradients. Thus the dispersion (variance) of descriptors for
different key points is achieved.
The
fluctuations of the gradient of a key point area are estimated in relation to
the direction of the gradient around the point in general (throughout all the
area around a key point). Thus, the descriptor invariance to rotation is
reached. The size of the area the descriptor is estimated on is determined by
the scale of the Hessian, which provides the scale invariance. The gradient
fluctuations are also estimated using the Haar filter. Let us focus on some key
points of the algorithm.
Integral
representation. For efficient calculation of Hesse and Haar filters the
integral representation of images is used.
The
integral representation is a matrix whose dimensionality is equal to the
dimensionality of the original image, and the elements are calculated according
to the formula:
(1)
where I(i,j) – the
brightness of pixels of the image.
With a matrix integral one can very quickly calculate the total
brightness of pixels, of random rectangular areas of an image using the
formula:
, (2)
where ABCD – the rectangle we are interested
in. The calculation of the Hessian
matrix. Detection of singular points in SURF is based on the computation of the
determinant of the Hessian matrix (Hessian). The Hessian matrix of the
two-dimensional function and its determinants is defined as:
(3)
(4)
The Hessian is used to find local minimum or maximum image
brightness. At these points, the Hessian reaches the point of extremum.
Figure
1 - The local extrema
Figure 1 shows that the singular points (circumscribed by colored
circles) are the local extrema of the image brightness. Small sized points are
not recognized as singular because of the threshold scissoring by the Hessian
value.
Figure
2- example key point
Figure 2 shows the ends of the segment, recognized as the key
points, using the Hessian matrix.
SURF uses binarized Laplacian Gaussian approximation:
Figure
3 - Filters for finding the Hessian in SURF
Figure 3 shows the filters used to find the Hessian in SURF. The
white areas correspond to the value +1, the black ones – 2 (in the third filter
– 1), gray – zero. The spatial scale is 9x9 pixels. This filter is more
sustained to rotation, and can be efficiently computed using the integral
matrix.
Thus the Hessian in SURF is calculated as follows:
(5)
where Dxx,
Dyy, Dxy are the convolution products by the filters
shown in the figure above. Coefficient of 0.9 has a theoretical basis, and
adjusts the approximate nature of calculations. So, to find singular points,
SURF traverses image pixels and looks up the Hessian maximum.
Finding a local maximum of the Hessian. To find a local maximum of
the Hessian we use the so-called method of neighboring points 3x3x3. Its
meaning is clarified in the figure below:
Figure
4 - Finding a local maximum
The pixel marked with a cross is considered a local maximum if its
Hessian is greater than any of its neighbors’ in its scale, and greater than
any of its neighbors’ with a bigger or smaller scale (a total of 26 neighbors).
The calculation of the descriptor of a singular point. A descriptor
is an array of 64 (128 in the extended version) numbers that identify a
singular point. Descriptors of one and the same singular point on the sample
and on scene need to be approximately the same. The method of calculating the
descriptor is that it is independent of rotation and scale.
To calculate a descriptor a rectangular area is formed around a
singular point; it has the size of 20s, where s is the scale in which
the singular point was found. The square is oriented along the priority direction
calculated for the singular point. A descriptor is considered a description of
the gradient for 16 quadrants around a singular point.
Figure
5 - Calculating the descriptor
Then the square is divided into 16 smaller quadrants, as shown in
Fig. 5. In each quadrant we take a regular 5x5 grid, and for a grid point a
gradient is sought with the Haar filter.
It should be noted that at the calculation of the Haar filter an
image is not rotated; the filter is calculated in the normal coordinates of the
image. But the received gradient coordinates (dX, dY) are rotated by an angle
corresponding to the orientation of the square.
To calculate the descriptor of a singular point, it is necessary to
calculate 25 Haar filters in every of the 16 quadrants, 400 Haar filters total.
Given that a filter takes 6 operations, it will take a minimum of 2,400
transactions for a descriptor.
After finding the 25 points of quadrant gradients, the four values
which actually are components of the descriptor are computed:
,,, (6)
Two of them are simply the total gradient in the quadrant, and the
two others - the sum of point gradients absolutes. The figure shows the
behavior of these magnitudes for different image areas.
Figure 6 - The behavior of the descriptor components
Fig. 6
shows the behavior of the descriptor
for various images. For 18 equal areas all the values are close to zero. For
recurring vertical stripes
all the values, except the second,
are close to zero. With the increase of
intensity of brightness in the direction
of X axis, the first two components have great values.
The four components of each quadrant and 16
quadrants give 64 descriptor components for the entire area of a singular
point. When recorded into the array, the descriptor values are weighted by a
Gaussian, with the center at a singular point and sigma 3.3s. This is necessary
for the greater stability to the noise in the areas far removed from the
singular point.
In addition to a descriptor, to describe a singular
point a Hessian trace sign is used,
i.e. the magnitude sign (Dxx + Dyy). For
bright points on a dark background,
the trace is negative for dark points on a light background - positive. Thus,
SURF distinguishes light and dark spots.
In this way, by applying
SURF to an image
we obtain a set of descriptors that will uniquely identify the standard (model) in the scene.
In conclusion, it should be noted that there are not
only positive but also negative aspects of this method, which were identified
in the course of its use for the detection of stable features of images. The
positive aspects of the method are:
- Invariance to rotation and scaling;
- Invariance to the difference in the overall
brightness of an image;
- It can detect more than one object in the scene.
The negative aspects of the method are:
- It is quite complex in implementation;
- A relatively slow operation of the algorithm.
REFERENCES
1. Gruzman I.S. Digital image
processing in information systems: Manual / I.S. Gruzman, V.S. Kirichuk et al.-Novosibirsk: Novosibirsk State Technical University in 2002. – 352 s.
2. Shapiro L. Computer vision. /
L. Shapiro, J. Stokman. Translated from English. - M.: BINOM. Knowledge
Laboratory, 2006. -752 s.
3. Bay H. SURF: Speeded Up
Robust Features / H.Bay, T.Tuytelaars, L.Van Gool
Table of contents: The Kazakh-American Free University Academic Journal №4 - 2012
|