A multi sensor approach to Botswana sign language dataset with view of addressing occlusion

Abstract:

Automatic Sign Language Recognition (ASLR) helps with converting hand gestures to spoken

language, therefore, enabling communication between those able to hear and those unable to hear.

There is abundant research work on ASLR of British Sign Language and American Sign Language.

However, Botswana Sign Language has received less attention at least in terms of computational

representation leading to automatic sign language recognition which can be attributed to lack of a

Botswana Sign Language dataset Sign Language Dataset. Work done on other languages is not

always directly applicable to Botswana Sign Language because sign languages differ significantly

from country to country. A dataset plays a pivotal role in sign language recognition pipeline.

However, one of the major challenges researcher’s encounter is accurately extracting hands and

fingers of a signer when the hands or fingers are not in the field of view of the camera (Occlusion).

Researchers have argued that using multiple sensors addresses occlusion better than using a single

sensor. This study proposes an approach to developing a Botswana Sign Language dataset based

on tracking data from the Microsoft’s Kinect sensor and the leap motion controller. The feature

sets from both devices are combined in order to improve recognition performance (especially when

occlusion). Recognition is performed by Support Vector Machines (SVM) and K Nearest Neighbor

(KNN). The resulting dataset consisted of five thousand four hundred and thirty-three (5433)

Botswana Sign Language gestures comprised of five (5) different sign words. The experimental

results obtained show that recognition performance improves when compared to using one device

to capture sign gestures. An overall recognition accuracy of 99.90% and 99.40% have been

recorded using SVM and KNN respectively.