A multimodal biometric database and case study for face recognition based deep learning

ABSTRACT


INTRODUCTION
The last couple decades have seen major advancements in biometric identification techniques.Additionally, a variety of biometric traits have been used for identification and verification, including the face, iris, fingerprint, palm print, and others [1]- [4].Biometric recognition systems may presently attain extremely high levels of accuracy when tested against biometric datasets that are readily available.The efficiency of each biometric system is nonetheless constrained by the intrinsic characteristics of biometric traits and the limits of detecting technology.Multimodal biometric fusion has therefore lately caught the interest of several academics [5], [6].Combining two or more biometric qualities from several people is an efficient way to overcome some of the limitations of using a single biometric system.This might improve overall matching accuracy and strengthen the security of biometric systems.There are several ways to research biometric fusion, one of which makes use of heterogeneous datasets [7], [8], that integrating biometric characteristics (such a fingerprint from a separate database and a signature from another.In the experiment, biometric characteristics from many people are combined to produce a "chimeric user."Although this approach is frequently used in multimodal research, Poh and Bengio [9] found that the performance assessed in trials with chimera users may not precisely mimic the performance of genuine multi- ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 677-685 678 modal users.The most effective method for researching biometric fusion is to use homologous databases of multimodal biometrics, where the different biometric traits are really collected from the same person.In this article, a new homologous multimodal database including biometric characteristics for the hand, face, and iris is presented.Additionally, it advises using one of the gathered attributes as a case study for deep convolutional neural networks (CNNs) to detect faces.It is advisable to avoid the use of heterogeneous databases in the context of multimodal biometrics, as correlating the data can be problematic.However, creating homologous multimodal databases poses significant challenges.This is because it usually takes longer to acquire the necessary data, which can cause subjects to respond negative to acquisition sessions of extended duration.Additionally, the database size and the cost of acquiring the data are considerably higher.Such a profession is typically significantly more challenging to manage.The development of real multimodal databases with a wide variety of biometric features and multiple users has, fortunately, been the focus of recent research.These databases are now accessible, but they have several drawbacks, such as a dearth of critical qualities or a lack of variety in sensors and attributes.Moreover, they considered limited due to the complexity and challenges associated with collecting multimodal biometric databases stem from technical, ethical, privacy, and logistical considerations.Overcoming these challenges requires careful planning, collaboration, and adherence to legal and ethical frameworks to ensure the integrity and usability of the collected data.However, there are a few available multimodal biometric databases for example: XM2VTS [10] database combines face and speaker modalities, providing synchronized video and speech recordings.This valuable resource offers researchers the opportunity to investigate the fusion of facial and speech cues.However, one limitation of the M2VTS database is its size and diversity, which could be expanded to accommodate a broader range of subjects, lighting conditions, and other variations.BANCA [11], multimodal biometric database has been widely used in research and development in the field of biometrics.However, it has faced criticism for certain limitations.One criticism is the relatively small sample size, which may limit the generalizability of research findings.Additionally, the database primarily focuses on a few modalities, such as face and voice, potentially overlooking the evaluation of other important biometric modalities.Despite these criticisms, the BANCA database still offers valuable data for studying multimodal biometric systems and their performance in access control scenarios.Moreover, in comparison to other databases, the MYCT database [12] is relatively simpler and predominantly focused on utilizing fingerprints and signatures as biometric modalities.This limitation hinders its effectiveness by excluding other types of biometric data.
Oppositely, the DMCsv1 [13] multimodal biometric database, containing 3D face and hand scans, provides researchers and developers in the biometrics field with a valuable resource for research and development.However, it is important to acknowledge certain limitations.One possible criticism is the database's relatively small size, which can affect the generalizability and statistical robustness of research findings.Additionally, the focus on 3D face and hand scans, which are more complex compared to other types of biometric databases, should be taken into consideration when utilizing the DMCsv1 database for biometric research.There are other databases with more than two biometric characteristics, such as BIOMET [14], which includes a person's hand, voice, fingerprint, and signature, and BioSec [15], which includes a person's face and eye movements.It also has multimodal biometric databases for voice, iris, face, and fingerprints, as seen below.Multimodal biometric databases, however, can come with a number of difficulties.The requirement to create algorithms that can successfully combine data from several modalities is one of the major issues.This can be a challenging undertaking since different modalities may have varying error rates and necessitate using various processing strategies.As not all users may be able to supply data for all modalities, it is necessary to design ways for coping with missing or incomplete data.Multimodal biometric databases are still an important field of study despite these difficulties.Several of these databases are shown in Table 1 along with their properties.The face, hand, and iris attributes for the same person were included in our multimodal biometric database (MULBv1); these traits were not included simultaneously in other databases.Voice, 2D face [11] 2003 BANCA 202 12 2 2D face, voice [12] 2003 MYCT 330 1 2 Fingerprint, signature [13] 2005 MyIDEA 104 3 6 Voice, face, signature, fingerprints, hand geometry, handwriting [14] 2006 M3 32 3 3 Voice, 2D face, fingerprint [15] 2007 BioSec 250 4 4 Voice, 2D face, fingerprint, iris [16] 2008 IV 2 300 1 3 Iris, 2D and 3D face [17] 2011 SDUMLA-HMT 106 -5 Gait, iris, finger vein, 2D face [18] 2012 BIOMENT 91 3 2 2D face, fingerprint [19] 2015 DMCSv1 35 2 2 Hand, 3D face [20] 2017 The remainder of this paper is structured as follows: the properties of the MULBv1 database are fully explained in section 2. The case study for face recognition using a deep convolution neural network is shown in section 3 utilizing the face sub-database for the MULBv1 database.Section 4 displays the outcomes of the experiment.Section 5 has the conclusion.

MULBV1 MULTIMODAL DATABASE
The MULBv1 database was put together at Al-Furat Al-Awsat Technical University in Kufa, Iraq, during the winter of 2023.A total of 174 people, comprising 116 men and 58 women, between the ages of 17 and 54, took part in the data gathering procedure.Each participant had their face, hand, and iris biometric features gathered, resulting in the creation of three sub-databases in MULBv1.It is crucial to remember that each person ID corresponds to a set of biometric features that were all obtained from the same person for each sub-database.Subsections will offer further details on each of the three sub-databases.

Database of face
A highly developed biometrics technique is facial trait recognition.Many studies focus on it [21].A face database created exclusively for in-person face recognition is included in the MULBv1.The faces were photographed in a variety of situations, including diverse positions, emotions, and the inclusion of accessories like hats and spectacles.Environmental elements like lighting and background noise were left unrestricted to provide a genuine experience.For each person in the face database, there are 20 jpg image files with varying file sizes.The total size of the face database is 7.97 GB.

Database of hand
The human hand has enough anatomical characteristics to allow for personal identification.The hand database in MULBv1 consists of 20 right hand images from various perspectives, some of which have a ring for each person.The hand database is made up of different-sized jpg images files.The overall size of the entire database is 10.3 GB.Sample images for the hand database are shown in Figure 1.

Database of iris
Iris recognition research has significantly increased during the past few years.Statistical analysis performed in found [22], iris possesses the most reliable and constant features of all biological qualities.Consequently, some recent study employing the iris trait [23], [24].As a result, we provide an iris database in MULBv1.The iris database includes 20 right Iris images for each person under different lighting conditions.The iPhone 14 Pro Max's micro camera was used to take pictures of the iris while maintaining a 2 to 5 cm gap between the device and the subject's eye.The sizes of the images, which were saved in the jpg format, varied.The overall size of the iris database is 1.30 GB.Sample images for the iris database are shown in Figure 2. The CNN model used in the study consists of eleven layers: three convolutional layers, three maxpooling layers, one flatten layer, two FC layers, one dropout layer, and finally one output layer.The explanations that follow are for each layer in the CNN that we created: − The first layer is convolution, and each convolution layer is followed by an activation function rectifier linear unit (ReLU).The image size remains 200 by 200 pixels.By using the max-pooling layer, the feature picture is scaled down to 100×100 pixels.− The third layer, which is likewise a convolution layer and has the same output size as the second convolution layer, is added after the second convolution layer.A max-pooling layer is added after that to provide an output with a 50×50 size.− The max-pooling layer, the following layer, is still a convolution layer and generates an image with a 50×50 pixel size.The output of the max-pooling layer is 25×25.− The flatten layer, which converts the feature map into a vector, is the seventh layer.− The eighth layer then uses a FC layer, which changes its number of units dependent on the preceding layer as well as the required number of categories.− The dropout layer, which is the ninth layer, is used to lessen network complexity and minimize overfitting.iii.Potential misuse and surveillance concerns: face recognition software may be used dishonestly or maliciously for things like mass monitoring, tracking people without their permission, or restricting personal liberties.Without sufficient protections and controls, the widespread use of facial recognition technologies may have negative societal effects.

Evaluation metrics
To evaluate the model, this study uses a variety of metrics, including accuracy, loss function, F1 score, precision, and recall.a. Accuracy: is one of the assessment metrics that is most frequently used for issue identification and categorization.It shows the proportion of predictions that came true overall.In (1) displays the meaning [27]: b. Loss function: is used in machine learning to measure the difference between a model's predicted output and the actual output.The reduction of the loss function, which raises the forecasting accuracy of the model, is the ultimate goal of machine learning model training.For classification problems, the crossentropy loss function is a well-liked option.It determines the discrepancy between the predicted probability distribution and the actual probability distribution of the labels.When creating the model, add "categorical crossentropy" to make cross-entropy the Keras loss function [28].c.Precision: this determines the proportion of correctly produced positive predictions to all of the positive predictions, as determined by (2) [27]: d. Recall: this determines the percentage of correct positive predictions among all of the actual positive occurrences.In (3) defines this [27]: e.The F1 score: a weighted harmonic average of recall and precision, calculated (4) [29]: Where: − True positive (TPos): the model correctly predicts the instance that belongs to the positive class and assigns it a positive label.− True negative (TNeg): the model correctly recognizes the instance as belonging to the negative class and assigns it a negative label.− False positive (FPos): the model correctly recognizes and assigns a negative label to the instance that really belongs to the negative class.− False negative (FNeg): a positive class instance is given a negative label by the model, which is an inaccurate prediction.683 convolutional layer was 3×3, and the window size was 2×2.The convolution layers contained 64, 32, and 32 kernels in that order.After multiple tries, the ideal CNN design was discovered, and the recommended network was eventually selected because to its great accuracy.Up until the best performing model was produced on the "MULBv1" dataset, the model was continuously improved by modifying the max pooling, kernel count, and number of convolutions.Sample efforts are presented in Table 3.Additionally, Figure 4 shows the relationship between the recommended model's accuracy and the quantity of training rounds.From Table 3 noticed when using 64 filters with size 3×3 in conv_1, 32 filters with size 3×3 in conv_2, and 32 fillers with size 3×3 in conv_3.Max-pooling with size 2×2, number of hidden layers is 512 and activation function is ELU, we obtain high accuracy and pest performance.Figure 4 shows a sharp rise in accuracy at the beginning, then a steady rise from epoch 10 onward, stabilizing around epoch 40.

CONCLUSION
Multimodal biometric security solutions greatly lessen the issues brought on by unimodal, while being more accurate than relying on a single biometric trait.There is always a possibility that hackers would steal the data from unimodal biometric databases.However, the lack of significant public multimodal datasets created under actual working conditions is one of the primary obstacles to creating, testing, and evaluating biometric recognition systems.The authors of this paper presented a first version of multimodal biometric database that is based on homologous characteristics, they named it the MULBv1 database and includes three distinct biometric traits: iris, hand and face traits for 174 individuals.A case study for facial identification using deep CNNs has also been reported that uses face attributes to assess one of the gathered biometrics.The results of the case study demonstrate the effectiveness of one of the collected biometric characteristics.An important area of biometric recognition research, diverse biometric fusions, will benefit greatly from the proposed database.An updated database will be created as the following work, and the database will soon be available on Kaggle for mostly research-related uses.An important area of biometric recognition research, diverse biometric fusions, will benefit greatly from the proposed database.An updated database will be created as the following work, and the database will soon be available on Kaggle for mostly research-related uses.

Figure 1 .
Figure 1.Sample images from hand database with and without accessory

4. 2 .
Assessment of the suggested model by measuring its accuracy, loss function, and F1 score A collection of 2437 face images were used for training the "MULBv1" multimodal database, while a different set of 1043 images were utilized for testing.The Keras deep learning framework's classification function, Softmax, was utilized to train the network.After a number of training rounds, the testing model's highest accuracy rate was discovered to be 97.41%, and the loss function was 0.2799.The filter size for each Bulletin of Electr Eng & Inf ISSN: 2302-9285  A multimodal biometric database and case study for face recognition based deep … (Ola Najah Kadhim) Figure 5 also shows how the proposed model's loss function (categorical cross entropy) is affected by how many repetitions were completed during the training phase.

Figure 4 .
Figure 4.The accuracy of the suggested model Figure 5.The suggested model's loss function

Figure 4 Figure 5 .Figure 6 .
Figure 4.The accuracy of the suggested model

Table 1 .
Multimodal biometric database Enhanced security: face recognition is a powerful security tool that can be used to reliably identify people.It may be used in a variety of settings, including opening cellphones, entering restricted locations, and confirming identities at border crossings or airports, possibly lowering fraud and unlawful access.ii.Convenience and efficiency: by doing away with the need for physical identity cards, passwords, or PINs, face recognition enables convenience.Processes like access control, identification verification, and attendance monitoring may be streamlined, saving time and easing administrative responsibilities.iii.Surveillance and public safety: by detecting those involved in criminal activity or helping to find the missing, facial recognition technology can support efforts to improve public safety.In order to improve security in public areas and support law enforcement authorities' investigations, it can be combined with surveillance cameras.Inconvenience, denying access to authorized users, or security vulnerabilities, if exploited, can result from these mistakes.
Eng & Inf ISSN: 2302-9285  A multimodal biometric database and case study for face recognition based deep … (Ola Najah Kadhim) 681 − The Softmax classifier, which is typically employed for many classification tasks, is utilized by the tenth layer, the Softmax layer, to identify members.This layer is expanding with new classes.It is employed at the network's top level, where its non-linear classification skills are excellent.The recommended CNN architecture is listed in Table 2. Figure 3. CNN architecture Table 2.A brief summary of the proposed CNN architecture − Cons: i. Privacy concerns: due to the gathering and storage of extremely sensitive and individual biometric data, the use of face recognition presents privacy issues.This data may be misused or handled improperly, which might result in privacy violations.ii.False positives and negatives: face recognition software isn't always accurate; it occasionally results in false positives (mapping an individual wrongly) or false negatives (failing to recognize a known  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 677-685 682 individual).

Table 3 .
Samples of attempts for build model until the best performing model was achieved on "MULBv1"