Abstract be efficient in capturing 3D representation of

Abstract This paper presents a novel Sign LanguageRecognition (SLR) framework that exploits the bestfeatures extracted from both the Microsoft kinect andleap motion devices for robust gesture recognition. Thehand shape with its internal details is segmented usingboth RGB and Depth modalities provided by kinectsensor. A spatio-temporal descriptor based on 3D gradient(HOG3D) and PCA are utilized for feature extractionof the segmented hand shape. CCA is employedfor classification of the kinect based features, where thebest signs with the highest scores, are selected. TheDTW-KNN is applied on the leap motion based descriptorscorresponding to the best signs to get the finaldecision. The Framework components are validated bycomparison with the state of the art solutions.

Accuraciesof 85.17%, and 92.02% are reported for kinect andleap motion sensors, respectively, while the accuracy isboosted up to 93.90% when both devices are exploited.Keywords Arabic Sign Language · HOG3D · CCA ·Leap Motion Sensor · Kinect Sensor1 IntroductionOne of the major trends in the computer vision researchis hand gesture recognition. Hand gesture can representgeneral gestures which can be used for general applicationssuch as human machine interaction or signlanguage gestures. Sign Language Recognition (SLR)Marwa ElpeltagyECE Department, Egypt-Japan University of Science andTechnology, New Borg El-Arab City, Alexandria 21934, EgyptE-mail: marwa.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

[email protected] AbdelwahabECE Department, Egypt-Japan University of Science andTechnology, New Borg El-Arab City, Alexandria 21934, Egyptsystems reduced the communication barrier betweenhearing-impaired and normal people. There is a big demandfor implementing systems that can interpret whatthey need to convey to the society to ease the communicationamong them. Hence several research efforts havebeen made to facilitate automatic sign language recognition.With the emergence of 3D depth sensors, signlanguage recognition system models have shifted from2D camera-based technique to a sensor-based 3D environment.

The introduction of low cost depth sensorssuch as Leap motion and Kinect sensors has allowedmass market and researchers to exploit the 3D informationof the body-parts movements that occur withinthe eld of view of the sensors to recognize the performedgesture. These sensors are found to be efficient in capturing3D representation of the gestures in real timewithout restrictions on the user or on the environmentconditions such as lightening variations and clutteredbackgrounds. Kinect sensor is used to capture RGBimage, depth image in addition to full-body skeleton,whereas Leap motion is used to capture precise fingersand hand movements within a smaller field of view limitedto one cubic centimeter, approximately allows torecognize the small details associated to the pose of thefingers such as positions and orientation.Signs can be classified as being static or dynamicgestures. The static gestures contain fixed hand shapewithout motion while the dynamic gesture involves meaningfulmotion in addition to hand shapes. Majority ofthe research work are directed to static gesture recognitionwhere few of them concentrated on dynamic gesturerecognition.

We focus our work to dynamic SLR usingboth kinect and leap motion sensors. Dynamic handgesture recognition is considered to be the problem ofsequential modeling and classication. In the literature,there are several sequence representation and classifi-2 Marwa Elpeltagy, Moataz Abdelwahabcation algorithms such as COV3dJ 1, HMM 2 andDTW 3.

The author in 1 proved that the COV3dJoutperforms HMM for human action recognition. Theauthor in 3 had, experimentally, proved the success ofDTW in 3D gesture recognition.Most of the research work employed single sensoreither Leap motion or Kinect sensor which is inefficientespecially if one nger/hand occludes other nger/handwhich affects the recognition rate. Since, both Leap motionand Kinect provide different modalities and differentfeatures; it seems to be more efficient to exploitboth devices together to boost up the performance. ASLR was proposed in 4, 5, 6, 7by combining the twosensors. The Author in 4, 5 used the fingertips positionsand directions features that extracted from thedepth modality of the kinect sensor using CandescentNUI library 8. One of the drawbacks of this library isthat it fails to detect the fingertips positions and directionswhen all fingers are bended on the palm region.Therefore, all the gestures that do not include extendedfingers cannot be recognized using these features.

Anotherhybrid SLR system was proposed in 6, 7 wherethe feature vector includes fingertips distances from thehand center, fingertips elevations and adjacent ngertipdistance.The disadvantage of using these features isthat they need to be normalized by dividing the valuesby the distance between the hand center and the middlengertip and this value need to be recomputed eachtime a new signer use the system as people differs intheir hand size.In this paper, we propose to use both sensors andexploit the best corresponding features and classifiersto enhance the recognition performance as compared tothe individual sensor methods.

A set of feature descriptorsthat avoid the previous drawbacks is introduced forboth the leap motion and kinect sensors. The proposedleap motion based feature vector consists of bones directions,hand orientation and angles between fingers.We exploit both the RGB and depth modalities providedby the kinect sensor to extract the complete handshape features with its internal articulations. First, thehand region is segmented from the depth image dependingon a certain threshold and used as a maskapplied on the RGB image. The spatio-temporal descriptorbased on 3D gradient (HOG3D) implementedin 9 and PCA algorithm is applied on the segmentedhand shape for feature extraction and CCA classifier isused to select the best signs. The DTW-KNN is appliedon the leap motion based descriptors corresponding tothe best signs to get the final decision. Arabic sign languagedid not receive much attention. One of the mostnecessary applications is the medical application wherethe situation is too difficult when doctors in hospitalscould not understand what the hearing impaired peoplesuffer from.

Therefore, a medical application basedArabic sign language dataset was recorded simultaneouslyfrom both leap motion and kinect sensors. Therecorded dataset consists of 33 signs and contains allthe modalities from kinect sensor in addition to theleap motion data. The experimental results show thatthe leap motion based descriptor achieves higher accuracycompared to the state of the art descriptors 5.

Furthermore, DTW-KNN solution component is comparedto the state of the art component in 1 and it isfound to outperform Cov3dJ based algorithm for recognitionperformance of the leap motion feature. We experimentallydemonstrate that fusing the RGBD basedhand shape feature extracted from the kinect with thefeatures extracted from the leap motion sensor is morerobust than the single sensor-based feature achievingaccuracy of 93.90% and avoids drawbacks of the previouswork. The paper is organized as follows: Section 2presents the related work. Section 3 explains the datasetacquisition. The proposed technique is introduced insection 4. Section 5 presents the experimental results.

Section 6 concludes the work.2 Related WorkKinect sensor has been used for the development ofgesture recognition systems. For example, SIFT (Scaleinvariant feature transform), SURF (Speeded-Up RobustFeature), VFH (Recognition using Viewpoint FeatureHistogram) were used in 10 with nearest neighborclassier for the recognition of 140 static Indian gesturesperformed by 18 signers. Accuracies of 49.07%,55.52%, 57.

19% were recorded by the authors usingSIFT, SURF, VFH, respectively. Gabor, Local binarypatterns (LBP) and HOG based features were usedin 11 to provide useful information about the handcongurations beside the skeletal feature. These featuresare classied by multiple Extreme Learning Machines onthe frame level. Then the classier outputs are modeledon the sequence level and fused together to providethe nal decisions. The framework was evaluated onthe ChaLearn 2013 dataset where accuracy of 85% wasachieved on the validation set. In 12, multiple featureswere extracted from different information modalities,including depth image sequence; body skeleton joints,facial landmark points, hand shapes, and facial expressionsfor ASL. In particular, Depth Motion Maps withHistogram of Oriented Gradients (DMM-HOG) is usedfor feature extraction from the color and depth images;and a histogram is used for representing Binary FacialExpressions (BFE) features and Binary Hand ShapeMedical application based Arabic sign language recognition employing leap motion and kinect sensors 3(BHS) features.

Then, linear SVM is used for classication,achieving an average recognition rate of 36.07%over 27 lexical items.In contrast to the Kinect sensor, Leap motion devicehas also been used by researchers for developmentof various gesture recognition systems. The leap motionsensor was employed in 13 for Australian SignLanguage (AuSL) symbols recognition. The sensor accuratelytracks hands and fingers movements and theArticial Neural Networks (ANN) was used for recognition.However, the system failed to recognize gestureswhen the hand movements position obstructs thesensors ability to view.

A Leap motion based gesturerecognition system for Arabic Sign Language (ArSL)is proposed in 14. The fingers position and the distancesbetween the fingers in each frame are the featuresthat have been fed directly in to a Multi-Layer PerceptronNeural Network (MLP) for recognition. The systemachieves recognition rate of 88% on 50 dynamic signgesture. In 7, The Hidden Conditional Neural Field(HCNF) classier was employed to recognize dynamichand gestures. Two datasets were recorded using LMCnamely LeapMotion-Gesture3Ddataset and HandicraftGesture.Two features were used namely single-fingerfeatures (fingertip distances, angles and elevations) anddouble-finger feature (Adjacent ngertip-distances andAdjacent ngertip-angles).

The recognition rate is 89.5%for the LeapMotion-Gesture3D dataset which consistsof 12 gestures and 95.0% for the Handicraft-Gesturedataset which consists of 10 gestures. Leap motion sensorhas also been employed in 15 for American staticsign language recognition of 26 letters from the Englishalphabet. A new features called average distance, averagespread, and average tri-spread were derived fromthe sensory data. A recognition rate of 79.

83% is achievedusing support vector machine classifier. Recently, someresearchers have proposed hybrid systems for developingan ecient SLR system by combining the input datafrom more than one sensor.A joint approach for gesture recognition has beenproposed in 6 by combining Leap motion and Kinectsensors.

The authors have computed ngertip angle, elevation,and distance based features from leap motiondata, where the Kinect feature is based on the curvatureand the correlation of the hand region. The multi classSVM was used for recognition achieving accuracy of91.28% on 10 ASL static gestures. Another one is introducedin16 where the leap motion data is used besidethe kinect data to aid feature extraction. The kinectfeature relies on the convex hull, the hand contour ofthe hand shape and the distances of the hand samplesfrom the centroid. The proposed features are fed to twodifferent classifiers, one based on multi-class SVMs andone based on Random Forests achieving accuracy of96.5% on 10 signs.

In 4, the author combined the featuresextracted using Kinect and Leap motion sensorsto describe gestures representing various words of IndianSign Language (ISL). These features depend onfingertips and palm positions and directions. A datasetconsistsing of 50 dynamic signs was recorded where 28signs were performed by single hand and 22 of the signswere performed using both hands.

The Markov Model(HMM) and Bidirectional Long Short-Term MemoryNeural Network (BLSTM-NN) are combined to boostup the accuracy, where accuracies of 97.85% and 94.55%have been recorded for single hand and double handed,respectively. CHMM was used in 5 to fuse the featuresextracted from both sensors to improve the performanceachieving recognition rate of 90.80% over 25single handed dynamic signs.

3 Data set acquisitionThere is a great challenge to build a recognition systemthat is able to recognize the whole ArSL dictionarywith high performance, so, it is convenient to collect thedatasets depending on a specific application. One of themost important and necessary application is the medicalapplication. Therefore, we have collected our signlanguage dataset depending on the words that is beingused in medical application.

The dictionary in 17classified the Arabic sign language according to the applicationthey are used in. We recorded 33 dynamic signwords from the medical application words in this dictionary.These signs are depicted in Fig 1 where the movementdirections are represented by arrows.

Some movementsare represented by the yellow arrows and othersare complex and represented by yellow and red arrowswhere the yellow arrows represent the first movementand the red arrows represent the second movement. Themovements in the words cancer and rays are to frontand it is difficult to draw. Some gestures are explainedby more than one frame. 18 words of these words usesingle hand and 15 words use the two hands. All signgestures are dynamic signs performed by 10 differentsigners three of them belong to a school for hearing impairedand seven of them are normal signers.

Each signword is repeated three times by the signer. The setupof the capturing is shown in Fig 2(a). The signer is sittingon a chair and the Leap motion is placed belowthe hand of the signer to capture the horizontal handinformation. All gestures are performed above the leapmotion sensor and the desk to ease the hand shape segmentationprocess.

The Kinect sensor is placed in frontof the signer to properly acquire the depth and skeletoninformation. This position is also useful for acquiring4 Marwa Elpeltag


I'm Mary!

Would you like to get a custom essay? How about receiving a customized one?

Check it out