Bidirectional recommendation in HR analytics through text summarization

ABSTRACT


INTRODUCTION
Over the last decade, online enrollment platforms like Naukri.com,Boss, Indeed Hiring, Glassdoor, Monster, LinkedIn, and Zhaopin have emerged to revolutionize the job recruitment process.These platforms have streamlined the hiring process and significantly transformed the job market landscape [1].Many researchers view a particular domain for job suggestions and introduce a method for online platforms utilizing detailed relational learning [2].This area effortlessly scales to millions of elements as well as candidate resumes and work positions including more information in the form of candidate interactions [3].Due to the fast variations of technology development, new work opportunities are always being created in industries with an advanced list of skill sets to face these wide changes [4].Individuals who are looking to change their careers or looking for better opportunities meet the trouble of skill discrepancy because of those continuous changes.This paper presents an innovative approach to job suggestions that caters to the needs of both job seekers and recruiters [5].The proposed system aims to benefit both parties by ensuring optimal job matches.Recruiters can easily identify the most suitable candidates for each job position, while job seekers also receive relevant job opportunities that align with their CVs [6].To achieve this, the process begins with the extraction of job offers from sa.indeed.com.Subsequently, the extracted job offers undergo preprocessing, training, and matching [7].The outcome is a streamlined and efficient job-matching process where both recruiters and job seekers actively participate in recommending and supporting candidates for the right job offers [8].This paper aims to briefly emphasize the benefits of creating job opportunities for people with disabilities.Despite facing disproportionate levels of uncomfortable job situations, unemployment, and underemployment compared to individuals without disabilities [9], it is essential to focus on generating employment prospects for this marginalized group.The main stage is for companies to employ a change in the aspect where the responsibilities of job involvement are not mainly focused on people with autism but on variations that workers can make to permit them to succeed in the organizational context.Most of the rapid research on recruiting has aimed heavily authenticity and validity of the company's selection activities and methods [10].Enrolling is important and concerned with finding a suitable candidate pool, attracting candidates, and acquiring them to register for open postings [11].Despite the focus on employer responses to hiring, it is equally, if not more, important to consider the job seeker's reactions to the recruitment process [12].Every day people employ industry-scale suggestion systems to suggest 1000s of candidates to the clients and the other way around opening to candidates.The job recommendation system is enriched on a heterogeneous collection of input information, resumes of candidates, vacancy texts, and structured information [13].
Traditional HR analytics systems importantly rely on unidirectional suggestions, where recruiters give suggestions to employees.Still, this method fails to capture the full scope of the employer-employee relationship [14].Further, there is a need for a more comprehensive and bidirectional recommendation system that can support text summarization methods to process key data through high volumes of HR-related text information [15].Shao et al. [16] implemented a wide investigation approach of external and internal operations for designing multivariate attributes in InEXIT.Initially encode the key and value of every attribute along with its source into a similar linguistic space.This paper presented a model for the internal process between multivariate attributes inside the job post and resume, further the external process between the resume and job post.In the end, the matching layer is introduced to find a matching degree.The reduction in economic costs and manpower, along with the support for InEXIT, contributes to the enhancement of online recruitment.Nonetheless, there are limitations in the implementation method, as certain aspects of InEXIT might possess reliable backups, such as pre-trained language models and fusion strategies.Ong and Lim [17] introduced an approach to information-driven work recommendation for professional vision.The skill recommendation recognizes and gathers the skill set essential for work based on work requirements released by companies recruiting for these positions.Further, the information gathering and processing abilities, and skill recognition use word implant techniques for job title presentation, next to a feed-forward neural network for job skill recommendation related to work title representation.The method combines job title representation utilizing bidirectional encoder representations from transformers (BERT), with job skill suggestions utilizing a direct feed-forward neural network efficiently regarding accuracy and F1-score.The limitation is that it may not capture important work trends or find future skill needs.Yang et al. [18] presented job suggestions for system-based merging content and filtering approach of cost-sensitive informative relational learning.Statistical relational learning (SRL) can represent the possible dependencies between the attributes of similar objects to give an honest way to merge 2 approaches.The paper introduced a way to accept the state-of-the-art SRL methods to hybrid suggestion systems and importantly no investigation applied to current big-data-scale systems.This method can permit modulating the exchange between the recall and precision of the system in a respectable way.Obtaining high-quality data like brief job descriptions or exact candidate profiles was difficult or costly it was the drawback of the method.Yildirim et al. [19] introduced a machine for reciprocal suggestion based on multiple-objective deep factorization for online enrolment.This paper focuses on solving the problem of a shortage of information containing corresponding choices in a network.Consolidated the multiple-objective learning method into multiple stateof-the-art approaches, whose victory has been determined by the same prediction issues and obtained hopeful outcomes.The method was better than traditional methods and maximized the speed of the process by 2 times.The drawback was data sparsity for particular minimum-famous users, the machine may struggle to create exact authentic recommendations.
Alsaif et al. [20] presented a system based on NLP bi-directional recommendation for suggesting jobs to job seekers and resumes to recruiters.This article introduces a beneficial system for both recruiters and job seekers.In this, recruiters can select the best employees for each work position for their job postings likewise employers also get better jobs that are similar to their CV.The process is initially, through sa.indeed.com the job offers are extracted, and after the extraction of job offers the pre-processing, training and matching are done.Comparing the task while serving recruiters and job seekers help their job by suggesting and supporting candidates to get the right job offers.The drawback was the lack of personalization, the NLP-related suggestion model was usually based on common patterns, which could result in generic recommendations that do not align with particular preferences.Jain et al. [21] implemented an ATS for the Hindi language by utilizing real coded genetic algorithm (RCGA).The rigorous experimentation on various feature groups was evaluated for differentiating features such as named sentence similarity and named entity features were merged with others for computing the evaluation metrics.The RCGA method chooses the better chromosome that includes real-valued weights of produced features which evaluates the distance among sentence values.The limitation of the implemented method was cost efficiency and the acceptance of ATS for HR solutions.Sethi et al. [22] introduced a transformer method for generating a better summary average.The method combines with Bart and T5 methods to know which method produces the best summary average.After passing 1000 data of articles, identify that the Bart method performed well than the T5 method in each aspect.The method was much more efficient and it takes only related content and leaves the unwanted and irrelevant content.The method utilized only the limited dataset and cost efficiency.There are some drawbacks of bidirectional recommendation for HR analytics by text summarization in the above-mentioned methods such as data sparsity, specifically for minimum famous users, which may affect the accuracy, lack of personalization in NLP-related recommendation models, lack of research used to real big-data-scale systems, and potential failure to capture necessary work trends or discover future skill needs.The main contributions of this work are given as: a.A decoder attention with pointer network (DA-PN) model is proposed on attention mechanism at encoder and decoder ends through combining attention mechanism in present period stage and implemented PN at decoder to resolve issue of unregistered words.b.The proposed DA-PN+Cover method depends on coverage mechanism combining cover vector.The process involves measuring attention mechanism at encoder and decoder in prior period stage.This measurement is then used to update a loss function that helps in avoiding word repetition when generating text summary.c.A DA-PN+Cover+MLO method is proposed through combining MLO and implementing a self-crucial gradient technique.It takes global reward and evaluation indicator in phase of iteration of method to protect grow of increasing faults in generated text summaries.The rest part of research is described as follows: section 2 gives informations about proposed method.Section 3 describes process of DA-PN+Cover+MLO method.Section 4 describes the results and discussion, and section 5 represents the conclusion of the paper.

PROPOSED METHOD
The bidirectional system is presented in the proposed methodology and it gives the best possible recommendations to both the employees and recruiters.The methodology involves cleaning data, and named The score is assigned to resumes by calculating the similarity between resumes and job descriptions; which helps to match resumes to that of the job description.NLP techniques are utilized for pre-processing the data and cosine similarity is utilized to measure the similarity between the resume and the job description.For the summarization of resumes and text, DA-PN+Cover+MLO method is proposed.Figure 1 represents the proposed model of this research.

Data collection and integration
The resume dataset is taken for this research work, which is gathered from kaggle and the dataset includes 1,735 resumes [23].Every profile includes fields like resume tile, location, role description, technical skills, education, certification, and additional data.There are four JDs acquired from LinkedIn on the topic of machine learning data scientist, full stack, java, and python developer.In JDs, the profile includes the job title, company name, city, description, education, and preference skills.

Pre-processing
Unstructured data is converted to structured data by using several pre-processing steps.Preprocessing is the preliminary stage for preparing data.The job description will be compared against a pool of resumes that are priorly loaded into the system, similarly, the candidate's resume will be compared against different job descriptions which are priorly loaded into the system.NLP tools are used to extract the required information by cleaning the data, the given resume/job description may contain different symbols and integers, which are not needed for the program.So, it removes the unwanted characters from the resume like numbers, stop words, and punctuations, and makes the resume/job description ready for the next process by converting all strings to lowercase.

Named entity recognition
Many times, resumes are populated with excess information, often irrelevant to what the recruiter is looking for in it.Therefore, it is tedious and hectic to evaluate resumes in bulk.Through the NER model, resumes can be evaluated at a glance, thereby reducing the effort required in shortlisting candidates among a pile of resumes.NER automatically generates summaries of resumes by extracting only primary entities like name, skills, and educational background.

Generating vectors
In NLP models work on numbers rather than textual data, so this textual data needs to be vectorized.Moreover, the word count of the words in each document is needed to compute the cosine similarity.The collection of text documents is converted to a matrix of token counts called bag-of-words (BoW).The output of this comes as a sparse matrix, in which each document represents a row in the matrix and each special content represents a vertical portion of the matrix.The cell value represents the count of words in the particular document.To understand the importance of words in the corpus, term frequency-inverse document frequency (TF-IDF) is used.TF-IDF is based on the logic that words that are too common in a corpus and rare and are not statistically significant in identifying a pattern.The logarithmic factor in the TF-IDF equation serves the mathematical purpose of assigning low TF-IDF values to words that are either overly common or excessively rare in the corpus.TF-IDF values a word by multiplying the word's TF equation with IDF equation.(2) The highest score of TF-IDF signifies the huge significance of words in the corpus while lower scores show lower significance.The words that have low significance for analysis can be removed, hence making the model building less complex by decreasing the input dimensions.Accurate semantic presentation methods are important in the applications of text mining.However competitive outcomes for automatic text classification may be attained with a traditional bag of words, such as the presentation method can't give satisfactory classification results in hard settings where higher text representations are needed.Both count vectorizer and TF-IDF fail to provide linguistic similarity between words and hence are only used for customizing stop word lists.To train a method on the default linguistic relationship of the words, Doc2Vec is used.Because Doc2Vec can read documents as a whole rather than working on every single word.It has a feature to provide n-Dimensional vectors.This transformation has two significant properties such as dimensionality reduction for effective representation and contextual similarity for expressive representations.

Cosine similarity
Measuring the similarity or distance first requires understanding the objects of study as samples and the chunks of those objects are measured as features.For the proposed analysis, the resume of the candidate and the job description are two data samples.Two features can be represented as two dimensions and through the Cartesian coordinate system, samples can be visualized.Naturally, samples can also be visualized in three dimensions if there are three features.However, the calculation of the distance will remain the same no matter how many features or dimensions are there.
The technique used to measure distances depends on a particular working environment.For instance, in some applications, the euclidean distance can be ideal and useful for computing distances.Other areas may require a more sophisticated approach for calculating distances between observations or points like the cosine distance.Cosine analogy is a metric utilized to calculate analogy in between documents and give a 1164 rank to the documents; it is independent of the size of the documents.Cosine similarity is more about the orientation of the two points in space rather than considering the exact distance from one another.This means that cosine distance is less affected by magnitude or how large the given numbers are [6].Assume  &  is 2 sample vectors for comparison.The relation for calculating the analogy of cosine is given as equation.
Likewise, ‖‖ represents the euclidean norm of the vector y.Cosine calculates the degree in-between vectors  & ; the value of 0 indicates two vectors are 90 degrees apart and there is no match.The value closer to 1 indicates two vectors are separated by a small angle and there is a match between two vectors.

DA-PN+COVER+MLO BASED SUMMARIZATION
To manage out-of-vocabulary (OOV) words, PN combining DA that merge decoder context vector with context vector produced from actual text, for regulating chosen actual text and addition lexicon.To address word repetition, a multiple-attention coverage mechanism is employed, which continuously utilizes coverage vectors from both encoder and decoder for influencing attention weight.The attention data of decoder is given as an input-to-input mapping layer, so model can give attention to prior data and minimize the recurrence of words.For readability of generated text summaries, mixed learning objective (MLO) function to positioning global reward, particular discrete gradient can increased on scoring which improves readability of generated text summary.

Decoder attention-based pointer network method
For resolving the issue of unregistered words, the DA-PN model is developed by using DA depending the PN.In every stage, the DA-PN model determines to copy an unregistered word through actual text or pin it to other word which is already in actual vocabulary.Former action is controlled by softmax which has probability normalization, although latter is related to words determined from input information by utilizing DA mechanism.If both merged, unregistered words are determined in actual text but are not added to actual vocabulary.
DA-PN method contains two input layers for copying words through actual text or actual vocabulary.Every dimension shows one word.  a switching mechanism utilized for producing a possibility for managing the input.The attention of the decoder compensates for the weakening data caused by huge sequences, which enables method to position main data much accurate.The distribution of attention regarding to possibility distribution of every word in actual text can be calculated as (6): Where   and   represents parameters of weight and   represents decoder's hidden layer state at time stage .Sum of weighted attention distribution acquired through ( 6) and decoder stage before present time stage represents context vector of the final decoder as (7).The possibility of the switch   could depends through decoder/encoder context vector with real invisible-layer condition, as ( 8)  9) and the possibility of the last prediction for word  is represented as (10): () =     () + (1 −   ) ∑   :  = (10) Where   represents the possibility of managing input value,   () represents the word distribution in the last outcome vocabulary and ∑   :  = shows the context vector of the encoder.The proposed DA-PN method easily identifies matchable words that are copied from source text.For this purpose, related weight is required to be given for every word.Words can be chosen from vocabulary that is developed based on copying words unregistered in actual vocabulary from actual text.Proposed DA-PN utilizes a limited vocabulary, resulting in savings in storage, computation power, and faster model training.Continuous attention is given for encoder and decoder, producing text summaries much accurate suits related actual text.Figure 2 represents the overall process of DA based PN method.

DA-PN+Cover method
The proposed DA-PN+Cover method combines DA-PN method with a mechanism of coverage combining multiple attention.Two coverage vectors are working, by utilizing the distribution of attention of encoder and decoder in every prior time stage.Two vectors for coverage are described as follows:    represents target sequence attention (words attention produced in the decoding process)    represents the words' attention in actual text (words' attention produced in the encoding process).The coverage degree for a particular word is addition of whole attentions acquired at a specific moment.In initial phase, there is no coverage acquired, so the two coverage vectors are taken as 0.
The inclusion of attention mechanism in decoder facilitates a more effective focus on key features within the generated text summary.This is particularly important to address the problem of word recurrence.To overcome this challenge and ensure attention is given to the entire sequence, a global vector is employed for maintenance purposes.The decoder's hidden layer utilizes a past time context vector for measuring present input, so the data acquired in the past can utilized during calculation which strengthens the previous and present time relationship.At present time, to utilize source text data in previous and target generated text summary, two coverage vectors in attention mechanism are combined as per (11): Where   and  ℎ represents a matrix of weight of repetitive nonlinear activation function and hidden state utilized to training respectively, ℎ  represents a hidden state,   represents a matrix of weight of repetitive present state sequence utilized to training,   represents present state sequence,   and   describes weight matrix of encoder and decoder hidden state in present stage respectively,    and    describes coverage vector of encoder and decoder respectively and   represents a bias of training iteration.The outcome of the present time stage is afflicted by past actual text and produced text summary, to ignore giving attention to similar data and also ignoring recurrence.Final loss function has the actual and coverage loss, the mathematical representation is described in (12): Where (  * ) represents the final predicted vocabulary of the possibility distribution, _  represents the function of coverage loss and  represents the coverage mechanism's weight in whole loss.By utilizing attention dispensation at the encoder and decoder ends and also utilizing additional loss items, the method efficiently suppresses probable repeated items and maximizes automatic production of text summaries.Figure 3 represents the overall process of DA-PN+Cover method with two coverage vectors.1167 that resulted is known as DA-PN+Cover+MLO.ROUGE is an evaluation indicator utilized for method iteration and also with global reward.However, ROUGE cannot be utilized directly for calculating gradients because it is not differentiable in and of itself.A greedy search is evaluated at every time stage from a probability distribution   of (   | 1  , …    , ) in every decoder's period stage and baseline outcome  ∧ is acquired through increasing outcome probability distribution.Determining () to reward function for outcome sequence  and then this is compared with actual sequence  * , a mathematical formula is described as (13): )) Here,  represents input vector,  ∧ represents output of baseline acquired through increasing outcome probability distribution and () represents reward function of outcome sequence .The decreased   is equal to the condition of increased sampling sequence   .Whether   has huge return than baseline  * , expected much return by method.After development, global reward is utilized and represented as Figure 4.The proposed DA-PN method which is attention in both encoder and decoder ends by combining attention distribution in the present time stage.The implemented PN at decoder resolves issue of unregistered words.The proposed DA-PN+Cover method is depended coverage mechanism combining cover vector.It measured from attention mechanism at encoder and decoder at prior time stage by updating a loss function that ignores the word repetition in generated text summaries.The proposed DA-PN+Cover+MLO method through combining MLO by implementing self-crucial gradient technique.It takes global reward and evaluation indicator in phase of iteration of method which protects grow of increasing faults in generated text summary.

RESULTS AND DISCUSSION
This section provides a results and discussion of the proposed method.The section gives detailed explanation of the experimental setup, quantitative and qualitative analysis, comparative analysis, discussion, and conclusion of the proposed methodology.The section 4.1 provides the experimental setup utilized for research, section 4.2 provides qualitative and quantitative analysis of proposed method, section 4.3 provides comparison between proposed and existing methods and section 4.4 gives the overall discussion.

Experimental setup
In this research, the proposed model is simulated by utilizing  (2018) environment with system requirements; RAM: 16 , processor: intel core 7, and operating system: windows 10 (64 ).The performance of the proposed method is estimated by utilizing the evaluation indicator ROUGE which is a software package that evaluates the automatic summarization process.Figures 5 and 6 shows the top matching resumes for recruiter and top matching jobs for candidate with respective percentage.

Qualitative and quantitative analysis
The calculated values are stored in a dictionary with the job/resume name and values as its value.Later, it is sorted in descending order and suggests the job/resume that has maximum matching or similarity values.For a recruiter, a top-matching resume summary will be displayed (Figure 5).For a candidate, the top matching jobs will be displayed with the respective matching percentage (Figure 6).For a candidate, additional skills to be enhanced are recommended by comparing with the dictionary of skills by area setup.
Table 2 and Figure 7 represent the performance of the proposed method in the LCSTS dataset [24].The evaluation indicator is utilized for the evaluation.The two baseline methods are taken for evaluation and the two DA-PN, DA-PN+Cover methods are utilized for the evaluation of the performance of the proposed method.The proposed DA-PN+Cover+MLO method attained an average of 26.73 which is better than baseline methods that attain 21.53.The evaluation indicator ROUGE-1, ROUGE-2, ROUGE-L, and an average of the ROUGE are taken for the evaluation of the performance of the proposed method.From Table 2, the proposed method shows higher performance than the other methods.3 and Figure 8 represent the performance of the proposed method in the TT News Corpus dataset [25].The evaluation indicator is utilized for the evaluation.The two baseline methods are taken for evaluation and the two DA-PN, DA-PN + cover methods are utilized for the evaluation.The proposed DA-PN+Cover+MLO attained average of 27.09 which is better than baseline methods that attain 22.53.The evaluation indicators ROUGE-1, ROUGE-2, ROUGE-L and an average of the ROUGE are taken for the evaluation of the performance of the proposed method.From  9 represents the performance of the proposed method in the kaggle dataset.The two baseline methods are taken for evaluation and the two DA-PN, DA-PN+Cover methods are utilized for the evaluation.The proposed DA-PN+Cover+MLO attained an average of 27.47 which is better than baseline methods that attain 21.01.The evaluation indicators ROUGE-1, ROUGE-2, ROUGE-L, and average of the ROUGE are taken for the evaluation.From Table 4, the proposed method gives higher performance than other methods.From the results mentioned above, it is clear that proposed DA-PN+Cover+MLO method outperforms existing methods and protects against grow of increasing faultss in generated text summary.

Comparative analysis
The comparative analysis of the proposed methodology is described in Table 5.The proposed method is compared with existing methods like ATS [21] and TF [26] which contains the kaggle dataset.The evaluation indicator ROUGE 1 and 2 is utilized for the comparison, the proposed method shows 1.83 in ROUGE-1 and 0.74 in ROUGE-2 which is comparatively higher than existing methods.Table 5. Represents the comparative analysis Methods Dataset ROUGE-1 ROUGE-2 Jain et al. [21] Kaggle [23] 0.79 0.66 Sethi et al. [22] 1.5 -Proposed method 1.83 0.74

Discussion
The proposed bidirectional system is to present the best possible recommendations to both the employees and employers by utilizing the existing unstructured data.Instead of searching the complete dataset to find the potential candidates, the proposed method reduces the duration difficulty of finding the applicant with the highest score.The DA-PN method is utilized to address the issue of unrecorded words and the encoder and decoder both give equal attention, leading to high exact text summaries.DA-PN+Cover method provides positive attentional effects resulting in the current time step choosing more precise words to use in word reviews by eliminating repeated terms.DA-PN+Cover model can be modified to include a MLO after the end of the distribution of collective mistakes in created word summaries.

CONCLUSION
Due to the growing demand for online recruitment, it is insufficient to use conventional hiring methods.It is not flexible to validate resumes online and is vulnerable to manual errors.The bidirectional method comprises NER for extracting the needed resumes.Cosine similarity shows the match percentage of the job requirements with the resume and vice versa.The DA-PN that has been put out to address the issue of unregistered words.DA-PN+Cover method can be modified to include MLO (DA-PN+Cover+MLO) is utilized for protecting grow of increasing faults in generated text summary.From performance analysis, it is concluded that proposed method gives better performance than other methods.Future work concentrates more on abstractive summarization techniques to enhance the quality of hire.

Figure 2 .
Figure 2. Pictorial representation of developed DA-PN method

Figure 5 .
Figure 5. Represents top matching resumes for recruiter Figure 6.Represents top matching jobs for candidate

Figure 7 .
Figure 7. Performance of proposed method in LCSTS dataset

Figure 9 .
Figure 9. Performance of proposed method in kaggle dataset

Table 1 .
Represents the dataset description Bidirectional recommendation in HR analytics throught text …(Channabasamma Arandi)1163 Where   represents a matrix of weight of prior word  −1 ,   and   represents a matrix of weight of encoder and decoder hidden state at present stage respectively,    represents the context vector of last encoder,    represents weight matrix,  represents bias for iteration in training and  represents the word that is predicted.The distribution of vocabulary of the present time step is described as ( represents context vector of final decoder, and   represents bias.Bias and weight matrix in (8) are the parameters which can added during iteration of model training.The linear weight adds those parameters that are processed through sigmoid function and marked among 0 and 1 to soft switch for managing input layer source, based on data of 2 stages namely source text and vocabulary.In (9)   Bulletin of Electr Eng & Inf ISSN: 2302-9285  Bidirectional recommendation in HR analytics throught text … (Channabasamma Arandi) 1165 represents a sequence of the present state,

Table 2 .
Performance of proposed method in LCSTS data

Table 3 ,
the proposed method gives higher performance than other methods.

Table 3 .
Performance of proposed method in TTnews corpus dataset

Table 4 .
Performance of the proposed method in kaggle dataset