Evaluation of Feature Points Descriptors Performance for Visual Finger Printing for Localization using Smartphones
Ilyar Asl Sabbaghian Hokmabadi, Adel Moussa, Dr. Naser El- Sheimy, University of Calgary, Canada
Localization and Location Based Services (LBS) using smartphone has become an important service that targets the general population. There has been a large amount of resources and research invested in this area in recent years and Visual Finger Printing (VFP) is amongst many approaches to face the problem of localization. Briefly, VFP is a method that relies on the images and a database to identify closest match to aid with localizing any device and albeit a smartphone. Visual Finger-Printing is computationally demanding and its processing steps should be adequately chosen to ensure it does not surpass the hardware limitations on the smartphone. After detection of interest points in the image, these points are typically described using one of the many available description algorithms. This description step is responsible for generating large amount of information that significantly impacts the accuracy, required storage and processing time of the whole VFP process. Therefore, the employed description algorithm should be carefully selected to match the hardware capability of the used device.
In this article, the performance of thirteen different descriptors is compared and investigated using the images captured by smartphones. Different aspects such as accuracy, processing time and storage requirements of the algorithms are measured based on real dataset to evaluate the potential of using different description algorithms towards VFP implementation on a smartphone.
One of the goals of this experiment is to evaluate the balance between the accuracy, time and memory space occupation such that it suffices day to day localization needs of a smartphone user. The ultimate objective is towards customizing such an algorithm with respect to phone’s specifications and realistic assumptions.
Key steps and the significance of this work:
The main steps of the descriptors evaluation include creating a visual reference database using acquired reference images, detection of the interest points of each reference/test image, description of the detected interest points using the evaluated description algorithms and finally matching the descriptor vectors with those of the database to find the closest match to localize the device.
The images used for building the reference database and for testing are taken from McEwan hall at the university of Calgary with wide variations in the point of view and lighting conditions. Such variations are made to allow different variations in the illumination, scale and orientation of the local features and to offer a realistic scenario for descriptors evaluation. The original image size has significant impact on the computational load therefore; we have reduced the image size before moving on to the later stages of the process. Then, the feature points of the image are detected.
The adopted detection algorithm in this step is Harris-Laplace detector which can be implemented locally since this method computes the local autocorrelation of each point on the image. Laplacian is used as measure to be maximized in the normalized scale space to find the characteristic scale, which results in a stable local feature points that enhance the Harris’s detector further by finding robust points in the scale domain. Harris-Laplace detector adopts Harris’s method to find the local structures and Laplacian operator with different scales are applied to the image and finally the maximum of these values, if it satisfies certain threshold, is labeled as stable feature points. Laplacian operator is reported to find high number of feature points whether it be in the image or the scale space. Therefore, Harris-Laplace detector is adopted in the evaluation process.
The thirteen evaluated descriptors belong to three main types of descriptors. The first group of descriptors depends on Scale Invariant Feature Transformation (SIFT) algorithm that enjoy high robustness to geometrical and photometric variations. The original SIFT algorithm depends only on the grey-level information in the image. This neglects the rich set of information provided in a colored image.
To study the effect of adding colored information, other alternatives such as Opponent-SIFT(O-SIFT). Red-Green-Blue-SIFT(RGB-SIFT) and Color-SIFT (C-SIFT) has been tested and their performances are compared to the SIFT. Two other types of descriptors, Color-Moment and Color-Histogram based approaches are also included in the evaluation. These algorithms are based on the intensity of the image pixels directly and not the gradient. Therefore, their performances in the terms of time and space are expected to be more efficient than those of SIFT-like algorithm.
The final step is finding the closest match for a captured image in the database. This step is done using exhaustive search and with Euclidean distance measure. The required storage, consumed time and accuracy of the matching of the test images have been reported for all the evaluated descriptors.
Results and Conclusions
In the terms of accuracy, SIFT-like algorithms systematically outperform the moment based and histogram based algorithms, with moment based algorithms performing the worse among these three. The computational demand for SIFT-like algorithms is substantially higher than the other two. A resizing of the original image size is often required to stay in the boundary of hardware feasibility.
The superior performance of the SIFT-like algorithm in the terms of accuracy shows the robustness provided by these descriptors.
The SIFT-like algorithm achieved an accuracy ranging for 80 to 96 percentage. This accuracy might further be enhanced if the unused pixel-location information of these features point is utilized to eliminate wrong matches.
Among SIFT-like algorithm, HSV-SIFT (Hue-Saturation-Value SIFT) and C-SIFT algorithms has significantly higher computational load than O-SIFT, RGB-SIFT and finally SIFT and Hue SIFT. An interesting point is that all six of these algorithms are exhibiting almost similar accuracy. The O-SIFT especially outperforms the rest if certain thresholds for the three measures are considered. The best achieved results for this algorithm shows accuracy above 90 percent. The results also systematically indicate that the HSV-SIFT and C-SIFT; despite their high computational load; does not introduce any improvements. These can be traced back to the literature where it is shown that HSV-SIFT feature vectors are in fact theoretically are less robust to photometric distortions than SIFT. Opponent Color space is shown to be more suitable color space than RGB for certain situations. This can also be seen in the results where O-SIFT systematically outperforms RGB-SIFT.
While the evaluated descriptors reached acceptable accuracies of more than 90% with potential for accuracy improvement, these descriptors suffer from the large space requirements to practically store and/or download the visual feature database. Also, the time consumed by these descriptors is beyond the practical usage of the casual daily usage. These time and space requirements promote adoption of more efficient and optimized representation of the visual information throughout the visual finger printing process.