Active Object Recognition For 2d And 3d Applications

Abstract

Active object recognition provides a mechanism for selecting informative viewpoints to complete

recognition tasks as quickly and accurately as possible. One can manipulate the position of the camera

or the object of interest to obtain more useful information. This approach can improve the computational

efficiency of the recognition task by only processing viewpoints selected based on the amount

of relevant information they contain. Active object recognition methods are based around how to select

the next best viewpoint and the integration of the extracted information. Most active recognition

methods do not use local interest points which have been shown to work well in other recognition

tasks and are tested on images containing a single object with no occlusions or clutter.

In this thesis we investigate using local interest points (SIFT) in probabilistic and non-probabilistic

settings for active single and multiple object and viewpoint/pose recognition. Test images used contain

objects that are occluded and occur in significant clutter. Visually similar objects are also included

in our dataset. Initially we introduce a non-probabilistic 3D active object recognition system

which consists of a mechanism for selecting the next best viewpoint and an integration strategy to

provide feedback to the system. A novel approach to weighting the uniqueness of features extracted

is presented, using a vocabulary tree data structure. This process is then used to determine the next

best viewpoint by selecting the one with the highest number of unique features. A Bayesian framework

uses the modified statistics from the vocabulary structure to update the system’s confidence in

the identity of the object. New test images are only captured when the belief hypothesis is below

a predefined threshold. This vocabulary tree method is tested against randomly selecting the next

viewpoint and a state-of-the-art active object recognition method by Kootstra et al. [1]. Our approach

outperforms both methods by correctly recognizing more objects with less computational expense.

This vocabulary tree method is extended for use in a probabilistic setting to improve the object recognition

accuracy. We introduce Bayesian approaches for object recognition and object and pose recognition.

Three likelihood models are introduced which incorporate various parameters and levels of

complexity. The occlusion model, which includes geometric information and variables that cater

for the background distribution and occlusion, correctly recognizes all objects on our challenging

database. This probabilistic approach is further extended for recognizing multiple objects and poses

in a test images. We show through experiments that this model can recognize multiple objects which

occur in close proximity to distractor objects. Our viewpoint selection strategy is also extended to the

multiple object application and performs well when compared to randomly selecting the next viewpoint,

the activation model [1] and mutual information. We also study the impact of using active

vision for shape recognition. Fourier descriptors are used as input to our shape recognition system

with mutual information as the active vision component. We build multinomial and Gaussian distributions

using this information, which correctly recognizes a sequence of objects.

We demonstrate the effectiveness of active vision in object recognition systems. We show that even

in different recognition applications using different low level inputs, incorporating active vision improves

the overall accuracy and decreases the computational expense of object recognition systems.

Read Download