Mathematics for 2D face recognition from real time image data set using deep learning techniques

ABSTRACT


INTRODUCTION
The major idea is to propose a method that recognizes and detects human faces in given pictures or videos.Facial recognition is the ability to automatically detect a face.It has the potential to identify different parts of the embodiment depending on the presence of facial characteristics.It's always easy for a person to find a face in a collection of images and differentiate images correctly.However, a computer should be properly trained so that when a real-time dataset is provided, the system should be able to identify the face of a person as well as additional characteristics such as eyes, nose, mouth, cheeks, lips, forehead, and chin.
The key concept here is to build a system that can distinguish faces from non-facial elements [1].Deep learning adapts to new images, assuming they are similar to the data it was trained on.Computer vision is an area of artificial intelligence that enables computers and systems to derive meaningful information from ISSN: 2302-9285  Mathematics for 2D face recognition from real time image data set using deep learning … (Ambika G.N) 1229 binary images, videos, and other visual inputs [2].The dataset is taken as a set of real-time images.But what is an image?An image is a collection of arrays representing different numerical values in terms of red, green, and blue (RGB) which is commonly used in computer vision.The values for these colors range from 0 to 255, where a higher value represents more intensity or brightness.Images are stored in multidimensional arrays.There are two main types of images: raster-based images and vector images.For our research, we are considering raster-based images.
There are different file formats for images such as jpg, GIF, PNG, TIFF, RAW [3], and PSD.The structures present in an image and the final output are the detected picture and a selection of the face [4].The competitive aspect of this research article is to develop a system that can identify faces under various lighting conditions.Using convolutional neural networks (CNN), we predict the faces from different images in the dataset.Predicting information in the image is challenging, and we first need to train a CNN to correctly classify the images.
Many researchers have explored various methods for analyzing the emotions of images; outcomes of machine learning techniques are significant.Among several machine learning techniques given in Table 1, methods that depend on deep learning produce the best efficiency for face recognition from the images in the photographs.Sparks [9] The brainstem control of saccadic eye movements CNN 200 93.7

MATERIALS AND METHOD
CNN model [10], [11] to be trained to predict whether an image has a triangle or any other information.How would you tell a computer about the shape present in the image If the image is shifted?How would a neural network be able to predict this?CNN have many filters, which are used to scan an image and obtain a feature map.Output is obtained by implementing the different concepts of CNN, as shown in Figure 1.CNN consists of various operations: i) convolutions, ii) feature detectors, iii) padding, iv) stride, v) activation layer, vi) pooling, vii) fully connected, and viii) Softmax function [12].

Convolution operation and feature detector
A scientific term to explain the method of combining two functions to obtain a third outcome is known as a feature map, as depicted in Figure 2. Convolution, also known as filter or kernel, is in a matrix form applied to an input image [13], [14].
Image * Kernel = Feature Map The convolution operation is used to detect the features in images; feature maps are also known as feature detectors, which can see many features in the image, as shown in Figure 3.A combination of these features is used to classify the image correctly.

Padding
It allows us to manipulate the feature map size.Conv filters produce an output more minor than the input; we have taken a 5x5 image and padded it with 0's in all dimensions: left, correct, bottom, and top corners.Padding helps pass the conv filters multiple times, as shown in Figure 4.

Stride
Stride defines how many steps the convolution window moves across the input image.Examples are illustrated for a stride of 1 and a stride of 2. An example for a stride operation is defined.A larger stride produces a smaller feature map output, and a larger stride has less overlap.Stride is used to control the size of the feature map.A larger stride has less overlap.We can calculate the feature map size using (2): Where n × n is input image size, f × f is filter size, s denotes stride value and p denotes padding

Activation layer
The purpose of the activation function is to enable the learning of complex patterns in our data, introduce nonlinearity to our network, and allow a nonlinear decision boundary via nonlinear combinations of the weight and inputs.There are several activation functions we can use in our CNN.Rectified linear units (ReLU) have become the activation function of choice for CNNs.ReLU function helps to train CNN.Simple computation (fast to qualify) does not saturate.The ReLu operation changes all negative values to 0 and leaves all positive values alone.

Pooling layer
Pooling is the process where we reduce the size of dimensionality of a feature map, allowing us to decrease the size of parameters in our network while retaining essential features.Pooling is also known as Subsampling or downsampling.In the below operation, A 2x2 kernel is used in the first block of 2x2; a maximum value of 123 is selected, and subsequent values in the output are 253, 187, and 165 in the production.
Pooling makes our CNN model more invariant to minor transformations and distortions in our images.Pooling helps to maximize the translation invariance.Pooling reduces the output size by sub-sampling the filter response without losing information [15].A significant stride in the pooling layer leads to high information loss.A stride of 2 and a kernel size 2x2 for the pooling layer were effective in practice.

Fully connected layer
It means all the nodes in one layer are connected to the outputs of the subsequent layer.Considers 3D data output of the previous layer and flattens it into a single vector used for input in the next layer.It is also known as dense layer.A fully connected layer compiles the data/outputs extracted from previous layers to produce outcomes, easy to learn nonlinear combinations of these features.

SYSTEM DESIGN 3.1. Why convolutional neural network works well on images?
Standard neural networks don't have convolution filter inputs; for images, every pixel will be its input; therefore, a small image that's 28 X 28 would have 784 input nodes for our first layer.The first step here is how to train a CNN.Conv filters learn what the feature detectors learn, and our typical early layer of CNN learns low-level features (like edges or lines) specified in Figure 5. Mid-layers learn simple patterns, whereas high-level layers learn more structured, complex ways [16]- [19].During training process, the following are the steps to be followed: − Initialize random weights values for our trainable parameters.− Forward propagate an image or batch of images through our network.− Calculate the total error (get some output through the random values).− We use a back propagation to update our gradients (weights) via a gradient descent.− Propagate more images (or batch) and update weights until all images in the dataset have been propagated (one epoch).− Repeat a few more epochs (i.e, passing all image batches through our network) until our loss reaches satisfactory values.This is how we generate random weights for one input image; we get the output of arbitrary values for several images, as shown in Figure 6.If all the values generated are correct, we need to figure out how we correct our results using the CNN.We create a loss function and use the back propagation method to do this.

Loss function
The loss function is used for quantifying the loss, how bad the probabilities we predicted, and we need to quantify the degree of our prediction.Entropy loss is used for quantifying the loss; it uses two distributions, Here y is the ground truth vector, ^y is the predicted distribution, and ' .' is the inner product.There are also other loss functions that exist.Loss function functions are also called cost functions.For binary classification problems, we use binary cross entropy loss.For regression, we often use the mean square error (MSE).
Other loss functions used are L1, L2, hinge loss, and mean absolute error (MAE).If there are errors in the values, we will update our weights using the back propagation technique to minimize the loss.

Back propagation
Back propagation is very important in training the neural networks, and this process is used to know how much to change/update the gradients to reduce the overall loss, as depicted in Figure 7.The Figure 7 shows the operation of back propagation and the formulas we use in the same [20].Using the loss value, backpropagation can tell us now, for the next iteration, how much we should increase or decrease the weights to reduce the overall loss in the network.By forward propagating input data, we can use backpropagation to lower the importance and the loss.But this tunes weights for that particular input or batch or inputs.We improve generalization (ability to make good predictions on unseen data by using all data in our training dataset.

Gradient descent
In the back propagation process, we update the individual weights or gradients given by wx+b.The main is to find the correct value of consequences where the loss is the lowest.Gradient descent is The method of achieving this goal (i.e., updating all weights to lower the total loss).It's the point at which we find the optimal weights such that failure is near the lowest.Gradients are the derivative of a function; they tell us the rate of change of one variable for another variable.

𝐺𝑟adient = dE dw
Where E is the error or loss and w is the weights.
A positive gradient means loss increases if weight increases, and a negative gradient means loss decreases.At point A, moving right increases our weight and decreases our loss negatively; at point B, moving right increases our weight and increases our loss positively.Therefore, the negative of our gradient tells us the direction in which we are moving.The point at which a gradient is zero means that small changes to the left or right don't change the loss.In training neural networks, this is good and bad; at point C, minor changes to the left or right don't change the loss.Minimal changes to the left or right don't change the loss, and the network gets stuck during training.This is called getting stuck in a local minima.We will use the mini batch gradient descent method, which combines both ways.It takes a batch of data points (images) and forward propagates all, then updates the gradients.This leads to faster training and convergence to the global minima.

Optimization
This is a more advanced gradient descent method that allows us to find the lowest weights, and a few advanced optimization techniques are used.What are the problems we need to deal with standard stochastic gradient descent?These include choosing an appropriate learning rate (LR), deciding on learning rate schedules, and using the same learning rate for all parameter updates (as in the case of sparse data).Stochastic gradient descent [21] is susceptible to getting trapped in local minima or saddle points (where one dimension slopes up and the other slopes down).
Several other algorithms have been developed to solve these problems, including extensions to the stochastic gradient descent method, such as momentum and nesterov's acceleration.Several optimizers, including Adagrad, Adadelta, Adam, RMSprop, AdaMax, and Nadam.have been introduced.We have used the Adam optimizer [22]; Adam's adaptive moment estimation is a method that computes adaptive learning rates for each parameter and stores an exponentially decaying average of past gradients, similar to momentum.Adam is quite effective.

IMPLEMENTATION 4.1. Dataset
Dataset considered for implementation is real-time dataset.Images are collected from the social media, different restaurants and several showrooms [23].Below is the displayed mathematical model for face recognition.We will create a collection of facial images in the database, labeled as y1, y2, y3, … yn.N classes are created from these sets, and each class corresponds to a registered person.We define a vector comprising K values for each image.
Let T represent the transpose operator.We define a distance function, denoted as d(μ2, μs), for the feature vector μ that corresponds to the farthest distance in the input form and fit in to class XL, for each image.
In the context of the provided distance function, class XL must exceed a precomputed threshold value, represented as d(μ1, μs) > τc.The face recognition algorithm takes an image as input and produces a sequence of face frame coordinates as output.There may be one face frame, zero face frames, or several face frames in this sequence [24].
The mathematical model for determining the integral image involves determining the pixel values of the face as (7): The sum of the intensities of the pixels within the black areas is as follows: let I_i(y, z) represent the value of the ith element in the integral image with coordinates (y, z), and let (y', z') represent the brightness of the pixel in the image under consideration at coordinates (y', z').
The equation above is employed for computing the total brightness of the pixels.With a threshold value of   , the classifier's expression is as (9): A set of weights, denoted as wi for 1 < i ≤ n, corresponds to each sample.The best strong classifier is computed using a fixed number of weak classifiers, and the equation for the strong classifier is as (10), (11): Where   represents the weak classifier, and   and β are the weight coefficients associated with the weak classifier.Here, c stands for the current number, and C=(1,....,C) represents the set of weak classifiers.The goal of this iterative technique is to build a reliable classifier.The following describes the images that show the object both before and after illumination: Where j is the number of current value of the sequence y, z,  , (, ) the brightness value of the pixel of the array  , .b is the identifier of the pixel array, with b taking values in the set (0,1).The scientific study involves the analysis of image dispersion after applying brightness.The dispersion value is determined by calculating the sum of squared differences between the pixel values in the modified image and the mean pixel value across the width and height arrays.The dispersion value is calculated as: In image processing, this formula describes the calculation of dispersion value between two images ((0.)  (1, ),   (, ) denotes the pixel value at coordinates (y,z) for the image parameter j.   is the mean pixel value along the y-axis for a specific image parameter j. random variable   which is calculated by (14):

RESULTS AND DISCUSSION
Colored RGB images are captured from a camera.The images are depicted in Figure 8 and represent real-time images of my friends.These images serve as input for face recognition, which is implemented using TensorFlow and Keras libraries.To extract facial features, we employed Haar cascade features [25], including various types such as: i) edge characteristics; ii) linear characteristics; iii) center characteristics; and iv) diagonal characteristics.The face recognition model and the images used for face recognition are subsequently employed for emotion detection of individuals using deep learning techniques, as illustrated in Figures 9 and 10.

. Measuring the accuracy of face recognition for an image
Several images, either in groups or individually, are uploaded into the system to check their accuracy.A person should have appeared in the photos multiple times.When all the images are tested in the proposed system, the data is computed in the confusion matrix, as shown in Table 2, to calculate the system's efficiency.The accuracy of the proposed system is then computed as Table 2

CONCLUSION
In this research article, face detection and face recognition in videos are achieved through deep learning techniques.The complete process of the face detection system begins with data training using the CNN approach, followed by face recognition, which is elaborated upon.This article employs Tensorflow and Keras to test the model on various RGB images.Additionally, the system's performance is assessed for both sets of images and single images captured via a webcam, as well as for video inputs.In video processing, the system effectively extracts faces of individuals.The proposed system exhibits a high level of accuracy, achieving a recognition rate of 94% after training with a substantial number of face images.However, several factors can impact the model's accuracyp.In scenarios with insufficient light intensity or other factors affecting image clarity, accuracy tends to decrease compared to situations with higher light intensity.Furthermore, the classifier plays a pivotal role in the recognition process, and superior results are obtained when the model is trained with a large dataset of images.The generated faces can be utilized as inputs in various applications such as emotion recognition, theft identification, defense applications, and more.Deep learning techniques consistently outperform OpenCV functions in terms of accuracy.

Figure 1 .
Figure 1.Block diagram of CNN

Figure 2 .
Figure 2.An example for a convolution operation

Figure 3 .
Figure 3. Different features obtained from a convolution operation Bulletin of Electr Eng & Inf ISSN: 2302-9285  Mathematics for 2D face recognition from real time image data set using deep learning … (Ambika G.N) 1231

Figure 5 .
Figure 5. Training process of CNN

Figure 6 .
Figure 6.Result of random values during the training process in CNN for six images

Figure 7 .
Figure 7.An operation of back propagation technique

Table 1 .
Literary review

Table 2 .
. Confusion matrix form face recognition