Algorithm for detection of image consistent features using surf method

Table of contents: The Kazakh-American Free University Academic Journal №4 - 2012
Authors:
Denissova Natalya, East Kazakhstan State Technical University in honor of D. Serikbayev, Kazakhstan
Tlebaldinova Aizhan, East Kazakhstan State Technical University in honor of D. Serikbayev, Kazakhstan

Many branches of engineering related to generation, processing, storage and transmission of information are to a large extent focused now on the development of systems in which information is of an image character. An image that can be considered as a two-dimensional signal, is a much more capacious storage medium than the usual one-dimensional (pertaining to time) signal [1].

The number of systems that use computer vision as their primary source of information increases every day, resulting in a need for new algorithms for image processing and recognition. The goal of computer vision is to develop useful conclusions about objects and scenes of the real world based on the analysis of images obtained with the help of sensors [2]. The study of computer vision is a scientific direction in the area of artificial intelligence and related technologies of scanning real objects, their processing, and use of the received data for an automated solution of applied problems. The start up of developments related to this direction, refers to the 1950’s. The first real success in this field was gained in the Cornell Aeronautical Laboratory in 1958-1960 in connection with the implementation on IBM-740 of a hardware version of the system of recognition of simple visual images – Mark I Perceptron (designed by Frank Rosenblatt).

Nowadays there are a number of different approaches to the implementation of computer vision systems from different variations on comparison of a received image with a master one to building complex three-dimensional models based on images. As the topic of object recognition is excessive, this study considers algorithm for detecting image consistent features using the speeded up robust features method (SURF).

The principle of the algorithm is as follows. For featuring a scene and a reference object image using the SURF there are critical points and unique descriptors for them. Comparing the set of descriptors one can distinguish a reference object against the scene [3].

SURF solves two problems: search of critical image points and the development of their descriptors that are invariant to scale and rotation.

The first phase, “Search for Critical Points” is carried out by means of the Hessian matrix. The determinant of the Hessian matrix (Hessian) reaches an extremum at the points of maximum variance in the brightness gradient. It is good at detecting spots, angles and line borders. Hessian is rotationally invariant, but not invariant to scale. Therefore, SURF uses multi-scale filters to find Hessians.

For each key point the direction of maximum brightness change (gradient) and scale, taken from the scale factor of the Hessian matrix are estimated.

The gradient at a point is calculated using the Haar filters.

The second stage is the “Formation of Descriptors.” A descriptor is a set of 64 (or 128) numbers for each key point. These numbers reflect the fluctuations of a gradient around a key point. As a key point is the maximum of the Hessian, this ensures that around a key point should be areas with different gradients. Thus the dispersion (variance) of descriptors for different key points is achieved.

The fluctuations of the gradient of a key point area are estimated in relation to the direction of the gradient around the point in general (throughout all the area around a key point). Thus, the descriptor invariance to rotation is reached. The size of the area the descriptor is estimated on is determined by the scale of the Hessian, which provides the scale invariance. The gradient fluctuations are also estimated using the Haar filter. Let us focus on some key points of the algorithm.

Integral representation. For efficient calculation of Hesse and Haar filters the integral representation of images is used.

The integral representation is a matrix whose dimensionality is equal to the dimensionality of the original image, and the elements are calculated according to the formula:

(1)

where I(i,j) – the brightness of pixels of the image.

With a matrix integral one can very quickly calculate the total brightness of pixels, of random rectangular areas of an image using the formula:

, (2)

where ABCD – the rectangle we are interested in. The calculation of the Hessian matrix. Detection of singular points in SURF is based on the computation of the determinant of the Hessian matrix (Hessian). The Hessian matrix of the two-dimensional function and its determinants is defined as:

(3)

(4)

The Hessian is used to find local minimum or maximum image brightness. At these points, the Hessian reaches the point of extremum.

Figure 1 - The local extrema

Figure 1 shows that the singular points (circumscribed by colored circles) are the local extrema of the image brightness. Small sized points are not recognized as singular because of the threshold scissoring by the Hessian value.

Figure 2- example key point

Figure 2 shows the ends of the segment, recognized as the key points, using the Hessian matrix.

SURF uses binarized Laplacian Gaussian approximation:

Figure 3 - Filters for finding the Hessian in SURF

Figure 3 shows the filters used to find the Hessian in SURF. The white areas correspond to the value +1, the black ones – 2 (in the third filter – 1), gray – zero. The spatial scale is 9x9 pixels. This filter is more sustained to rotation, and can be efficiently computed using the integral matrix.

Thus the Hessian in SURF is calculated as follows:

(5)

where D_xx, D_yy, D_xy are the convolution products by the filters shown in the figure above. Coefficient of 0.9 has a theoretical basis, and adjusts the approximate nature of calculations. So, to find singular points, SURF traverses image pixels and looks up the Hessian maximum.

Finding a local maximum of the Hessian. To find a local maximum of the Hessian we use the so-called method of neighboring points 3x3x3. Its meaning is clarified in the figure below:

Figure 4 - Finding a local maximum

The pixel marked with a cross is considered a local maximum if its Hessian is greater than any of its neighbors’ in its scale, and greater than any of its neighbors’ with a bigger or smaller scale (a total of 26 neighbors).

The calculation of the descriptor of a singular point. A descriptor is an array of 64 (128 in the extended version) numbers that identify a singular point. Descriptors of one and the same singular point on the sample and on scene need to be approximately the same. The method of calculating the descriptor is that it is independent of rotation and scale.

To calculate a descriptor a rectangular area is formed around a singular point; it has the size of 20s, where s is the scale in which the singular point was found. The square is oriented along the priority direction calculated for the singular point. A descriptor is considered a description of the gradient for 16 quadrants around a singular point.

Figure 5 - Calculating the descriptor

Then the square is divided into 16 smaller quadrants, as shown in Fig. 5. In each quadrant we take a regular 5x5 grid, and for a grid point a gradient is sought with the Haar filter.

It should be noted that at the calculation of the Haar filter an image is not rotated; the filter is calculated in the normal coordinates of the image. But the received gradient coordinates (dX, dY) are rotated by an angle corresponding to the orientation of the square.

To calculate the descriptor of a singular point, it is necessary to calculate 25 Haar filters in every of the 16 quadrants, 400 Haar filters total. Given that a filter takes 6 operations, it will take a minimum of 2,400 transactions for a descriptor.

After finding the 25 points of quadrant gradients, the four values which actually are components of the descriptor are computed:

,,, (6)

Two of them are simply the total gradient in the quadrant, and the two others - the sum of point gradients absolutes. The figure shows the behavior of these magnitudes for different image areas.

Figure 6 - The behavior of the descriptor components

Fig. 6 shows the behavior of the descriptor for various images. For 18 equal areas all the values are close to zero. For recurring vertical stripes all the values, except the second, are close to zero. With the increase of intensity of brightness in the direction of X axis, the first two components have great values.

The four components of each quadrant and 16 quadrants give 64 descriptor components for the entire area of a singular point. When recorded into the array, the descriptor values are weighted by a Gaussian, with the center at a singular point and sigma 3.3s. This is necessary for the greater stability to the noise in the areas far removed from the singular point.

In addition to a descriptor, to describe a singular point a Hessian trace sign is used, i.e. the magnitude sign (Dxx + Dyy). For bright points on a dark background, the trace is negative for dark points on a light background - positive. Thus, SURF distinguishes light and dark spots.

In this way, by applying SURF to an image we obtain a set of descriptors that will uniquely identify the standard (model) in the scene.

In conclusion, it should be noted that there are not only positive but also negative aspects of this method, which were identified in the course of its use for the detection of stable features of images. The positive aspects of the method are:

- Invariance to rotation and scaling;

- Invariance to the difference in the overall brightness of an image;

- It can detect more than one object in the scene.

The negative aspects of the method are:

- It is quite complex in implementation;

- A relatively slow operation of the algorithm.

REFERENCES

1. Gruzman I.S. Digital image processing in information systems: Manual / I.S. Gruzman, V.S. Kirichuk et al.-Novosibirsk: Novosibirsk State Technical University in 2002. – 352 s.

2. Shapiro L. Computer vision. / L. Shapiro, J. Stokman. Translated from English. - M.: BINOM. Knowledge Laboratory, 2006. -752 s.

3. Bay H. SURF: Speeded Up Robust Features / H.Bay, T.Tuytelaars, L.Van Gool

Table of contents: The Kazakh-American Free University Academic Journal №4 - 2012

Main

About journal

About KAFU

News

FAQ