Analyzing the relationship between information technology jobs advertised on-line and skills requirements using association rules

Received May 4, 2020 Revised Apr 29, 2021 Accepted Jul 14, 2021 Online job vacancy sites have become an important source of information about the characteristics of labor market demand. It has become an avenue for job matching by both employers and employees and to study and analyze the labor market. This study proposed a methodology for identifying and analyzing skill-job relationships using frequency word occurrences of skills as a requirement of the job. It employed association rule mining which aims to discover frequent patterns, relationships among a set of items in the database. It collected published job vacancy data to IT job and skills requirements from various job portal websites. The proposed job skill requirements for specific I.T. jobs published online analyzing using the FP-growth algorithm of association rule provide a new dimension in labor market research. The study revealed that skill words are highly related to a certain job requirement. The results of the study could provide insights on the gap between the school acquired skills and actual IT industry skill needs and as the basis for curriculum enhancement and policy-making interventions by the Philippine government in its educational system.


INTRODUCTION
With the widespread access and use of Internet and increased knowledge in digital literacy, posting and searching for job vacancies replaced the traditional methods of job searching. Online job portals are websites that provide for announcing job positions and make it possible to find job vacancies at your fingertips. They are aggregation of job vacancies from companies and resume of various applicants [1]. It serves as a way for posting, searching, selection of applicants applying to the advertise jobs [1], [2]. "Online job vacancy portals contain job offers for almost all occupations and skill levels. These platforms are a rich source of information about the skills and other job qualifications which are difficult to gather via traditional methods" [3]. They are potential data source for the analysis of labor market demand that is to identify, analyze and track skills requirements in the labor market [4]- [8]. The data published on online job advertisement websites has been increasingly significant area of research. Online job portals provides a platform on which demand and supply meet which could inform policy makers enabling cross-country comparisons [5].
The need for graduates with current skill set is of constant concern [9]. Due to the growth and rapid expansion of the IT sector and the introduction of new technologies it resulted in an abundance of job titles which requires current skill requirements [10], [11]. Hence, the skills of IT professionals need to be updated and by doing so it must requires skills that are on demand [12]. With the changing job trends, IT professionals have better employment packages and job opportunities due to high demand for their knowledge and skills [13]. The technological job skill needs of business and industry are continually evolving, which presents a challenge to educators and students attempting to focus on the right skills to meet these changing needs [14]- [16]. Furthermore, according to [17] information technology (IT) is a rapidly changing field and while studies have done some time ago are useful and informative, it is necessary to continually gather information about education and employment. Recruiters and hiring managers look for prospective candidates who can start with his/her job that is required by his/her job position. However, the competencies acquired by the graduates in the academe don't complement with the requirements set or what these companies are looking for. The demands for computer professionals are very high and the employers have a high qualification for ICT jobs. Hence, the IT competency model was formulated. The IT competency model [18] identifies the knowledge, skills, and abilities needed for workers to perform successfully in the field of IT. The IT competency model is represented by a pyramid and involves several tiers. The tiers are represented by a pyramid shape that competencies at the top are at a higher level of skill. Furthermore, the models shape the increasing specialization and specificity of proficiency covered that are needed by the industries. Its tiers are divided into blocks that represent competency areas (i.e., groups of knowledge, skills, and abilities), which are defined using critical work functions and technical content areas. Tiers 1 through 3 represent the "soft-skills" and work readiness skills that most employers demand. Tier 2-Academic Competencies are learned in a school setting like cognitive functions and thinking styles and likely to apply to all industries and occupations. Tier 30-Workplace Competencies represent motives and traits, as well as interpersonal and self-management styles. They are generally applicable to a large number of occupations and industries. Tiers 4 and 5 are industry-specific competencies needed to create career lattices within an industry. The Employment and Training Administration's IT model does not include Tier 5 competencies. Included in this category are occupation-specific skills requirements and management competencies. This study will adopt the model which listed some IT knowledge, skills, and abilities. It intends to analyzed and discover the relationships between on-demand skills and advertised jobs online through associative rule analysis. Associative rule as cited in the study of [19], it is used to find associations between items or itemsets and association rules. Some of the published researches dealt with extraction of information on skills demand, identification of skills on-demand, discovery, and analysis of job requirements by implementing data mining techniques such as association rules applying them to publicly available data sites. Furthermore, textual analysis can be carried out with the combination of association rules and ontology mining [20].
The study of [21] used web mining to extract online job advertisement in a search engine to build a professional profile and compared with ones present in the official classification systems. The association rules are then classified based on the kind of jobs and also based on the kinds of qualifications. [22] collected from ICT job vacancy portals and analyzed and applied association rules to determined the relationship of ICT skills and careers. In the paper of [23], they introduced a methodology of identifying skills demand through public access of job vacancy using of web and text mining tools. They are able to extract valuable facts about competences and abilities sought by employers. The paper of [24] used weighted association rules to analyze the job requirement for IT field and obtained the relationship of job requirements and computer skills. Other study analyzes that implemented other machine learning techniques also conducted by [25] they demonstrated data mining techniques such as classification with k-NN textual and information extraction from textual dataset to uncover knowledge through public access job vacancies. The research proposed an approach that allows for identifying occupations and labor market demands within a given the job vacancy dataset. [26] analyzed job qualifications from a large set of data for choosing career and professional goals. A survey is undertaken to collect and prepare data about employment. The research then employed a-priori algorithm to discover the frequent itemsets and the association rules. Another study [27] applied Apriori algorithm of the association rule and used recommendation techniques based on the output of the skill association to determine the most sought after IT skills in the industry. The proposed method is also find skill combinations that are prominent in job advertisements.
In the published research of [28], they analyzed current labour market demands for organizational and end-user information systems professionals based on an analysis of job advertisements for the online job portals. They analyze and categorize demands using the job responsibilities and knowledge and skill requirements specified in these advertisements. Nesterenko [29] analyzed the Russian IT job market in requirements extracted from job advertisements and in skills extracted from profiles of potential employees. They employed association rules to extract frequent combination of skills, characterizing job profiles. The study revealed two large groups of functional roles. Hierarchical clustering and association rules allowed to form nine clusters, which are closer to the professional fields. The results is considered interesting as they allow to discover flexible data-grounded job roles and skill patterns. Thus, it improve the skill matching approach to allow comparisons taking into account skills on different generalization hierarchy levels and compare these results with the structure, based on the job advertisements skills. Hossain, Arafin, and Mohammad [30] proposed an intelligent system to recommend appropriate jobs to freelancers from different online job sites. They have use machine learning techniques to classifying jobs. They have used Apriori rule mining to derive frequent skill-sets used in completed jobs of freelancers. A possible job list is created to the freelancers by matching these frequent skill-sets with the skills required in the posted jobs.
On the other hand, the research conducted by [31], analyzed and suggested a method to evaluate student's programming skills through the utilization of data mining such as association, classification, and clustering techniques. It also specifies the means to identify their skills and assist them to improve their knowledge by predicting training programs. The study of [32] employed a visual exploratory discovery and analysis approach to determine the demand for jobs, skills specific to a domain or industry sector, and additional non-domain skills required to fill a job role. Meanwhile, the works of [33] explored the significance of key skills to employers, their perceptions of the availability of these skills in the labor force, and employer's knowledge and use of key skills required for the job. They employed a combination of quantitative and qualitative data in the research. The quantitative data are mainly from surveys from employers where these employers were interviewed about key skills. Based on the result of the survey, around half of the respondents knew about core skills, however, only 41% of those aware of core skills were unable to name any specific skills. Employers were most likely to name skills related to basic skills, thus, a confusion between basic skills and key skills which can also be reflected in interviews with the employers. Only those employers who have direct links with education and training were aware of the key skills. Both quantitative and qualitative data illustrate the importance of key skills to employers. Zieglerl [34] concluded that skill requirements listed in online job ads can offer important insights on skill demand and skill wage differentials. In the light of the above statements, it is worthwhile to complement traditional skills research with a more flexible and dynamic approach to determine skill gaps. This study proposed a new methodology of retrieving and analyzing the content of job skills online advertisements.
This paper analyzed words and word patterns of IT jobs published online in relation to the skills requirements as perceived that of the industry. This study helps to determine the actual and future needs and trends of IT jobs in the market. Furthermore, this will serve as a basis for curriculum enrichment and laid out the intervention program to address the gap between the skills acquired in the school and the IT industry skill needs. Furthermore, the results of the study could provide insights into the gap between the school's acquired skills and actual IT industry skill needs. It seeks to attain the following objectives:  To determine words/word pattern skills needs of the IT industry in the labor market based on online advertisement?  To analyze the words/word pattern skills needs of the IT industry in the labor market based on online advertisement? This paper is organized as shown in section 2 outlines the steps in this paper in the implementation of extracting information from online job vacancy sites and the machine learning algorithms employed in the study specifically association rules. Section 3 discuss the results of finding patterns and associations between job skills and job posted online. Finally, section 4 concludes the paper.

RESEARCH METHOD
This study was descriptive research in nature. Data ingestion was utilized to gather published job skills for IT professional as stated in CHED memo and ACM IT curricula. The procedure of collecting published job vacancy data to IT job skills requirement involves several steps as shown in Figure 1.

Job information searching published job skills
It starts with selecting the information source using google search using the keyword "information technology jobs Philippines" and Job-hunting sites in the Philippines like Job street, Kalibrrn and other job hunting sites.

Data ingestion
The identified job published entered into the data ingestion phase, which involves identifying job vacancies available in the source and downloading their content into an excel file. All information on the published job vacancies were transferred to an excel file.

Information extraction and data cleaning
The retrieved text from Job-hunting sites contain several HTML tags, unnecessary characters, nontextual characters, and web codes which were automatically stripped out using a modified program in PHP. In addition, data obtained from Job-hunting sites usually contain syntactic features, html code and entities like <> and which are embedded in the original sites. Thus, it was necessary to remove those contents from the data because they might affect the result of sentiment classification and were not useful for the machine learning for sentiment analysis. Hence, a PHP application module was designed and developed was used for cleaning retrieve text from Job-hunting sites. The next step is the information extraction phase were the relevant content of the identified job skills were organized and classified.

Skill and job classification
Building a job classification that organized records into exclusive job groups-IT staff, network administrator, system analyst, computer programmer and database administrator. This is based on the primary job roles of BS IT graduate as stated in the commission higher education (CHED) memo 25 series of 2015 and ACM IT curricula 2017. Appended the excel dataset with an additional attribute "Job" and manually assigned job title for each of the records as implied by its job skill based on the published job. This work is necessary to provide the algorithm with information about the skills needed for each job.

Skills pattern recognition 2.5.1. Pattern recognition process
The term frequency-inverse document frequency (TF-IDF), schema was used to reflect the numbers/frequency of the important words. This schema was used as to determine by counting the number of occurances of job skill words in publicly available job websites. The number of occurances of skill terms in an online web pages weighted with a greater significant is the way used to discover the dominant skills words and skill patterns. The TF-IDF score increase in accordance to the frequency of times a word (skills) appears on an online job posting websites, but is countered by the word's frequency in the dataset, which helps to account for the fact that some words are more prevalent than others [35].

Association mining rules
This stage presents a machine learning models to analyze skill words/skill word patterns or from a collection published jobs and their skills requirements by automatically extracting frequent words in each web site. Below we define and describe the association rules: Consider the following assumptions for representing the association rule in terms of mathematical representation, T={wi,wi2, … , im} be a set of items. Where skills Sc={ } s1,s2,..., sm , where each dataset si is a set of keywords such that t⊆A. Let Wi be a set of keywords. The rule Wi⇒W jholds in the collection of skills Tc with confidence c if among those skill words that contain Wi, c % of them contain W j also. The confidence is calculated as shown in below: Support, which is the ratio of the number of instances when [w1 wj] appeared together in a single transaction to the total number of transactions, is used to quantify frequent item sets, whereas confidence is defined as the probability of finding [w1, wj] together [35].

Correlation of job skills requirements
To determine the correlation between the words, Lift ratio is utilized in this study.
If the value of lift rule is greater than one (1) then it has positive correlation. A lift value which is greater than one indicates pair of skill words appears more often together than expected. If the value of lift rule is less than one, then there is a negative correlation and the pair of skill words appears less often together than expected. If the value of lift rule is equal to one, then it is independent. A lift value of one indicates that the pair of skill words appear almost as often together as expected [35].

The frequent pattern growth fp-growth approach
FP-growth is one of the most utilized association rule mining algorithms. The FP-growth algorithm utilized an analytical process that finds/locate frequent patterns/associations of job words from the dataset without generating the candidates [35], [36].

RESULTS AND DISCUSSION
The aimed of this study is to find and analyzed the relationship between job skills word patterns and job posting online. The results revealed skills required for a certain IT job position. The results utilized skill (term) frequency and co-occurrence within each job posting. Further, the result representing domain word skills related to a particular job.

Skills required for database administrator
The association rule results in Table 1 reveal the relationship between job skill requirements for a database administrator. These were monitoring of databases, applications of a database, knowledge in SQL database, manage database technologies and knowledge in business. The said skill rules are the basic skill requirements for a database administrator based on the published IT job and job skill requirements which is also the industry-wide technical competencies under IT competencies model [18] Table 2 reveal the relationships that year of experience in design, analysis of a system and experiences in SQL are the identified skill rules that a system analyst must possess. In addition, relevant and knowledge in the area of design, analysis, problem solving and SQL is also desired by employers in this IT job. This implies that the experience in system analysis and design is the main skill that a system analyst should possess. Lift value indicates that knowledge in relevant problem solving skills is the most needed skills by a system analyst based on the occurrences of the word skills in the job posting for system analyst.   Table 3 shows the needed skills for IT staff is should be a graduate of information computer technology. Graduates of these courses acquired the set of foundational and employability skills, knowledge, and abilities that are required for all information worker employees. These are the universal skills-problem-solving and apply technical knowledge and tools effectively. This indicates that an IT staff needs to be a graduate of computer IT because has the basic needed IT knowledge and skills for IT staff.  Table 4 reveals that a computer programmer must-have skill and knowledge in SQL database and knowledgeable in python, JavaScript and office as programming and application productivity tools. In addition, they should have skills in CSS, SK, and HTML basic skill requirements for a computer programmer based on the published IT job and skill requirements. Furthermore, skills such as software design and software engineering were included as additional skills required for a computer programmer. The analysis of [16] also indicates that job ads requesting IT job include HTML, Java, JavaScript, and MySQL skills. This indicates that aside from the programming skills they should have experience in software design and must have knowledge of software engineering. rules results can be explained with that skills/competencies required of a network administrator under the IT competencies model [18], [37] should be experienced in knowledge in networking, network engineering, CISCO and be able to troubleshoot networks. These are database-related technical skills that a database administrator should possess. This study analyses job skills requirements based on online IT job vacancies posted online which provides information about IT job vacancies and skills requirements. The findings show the word skills association results enhance information extraction from job descriptions posted online. Gathering online published available data, with web and text mining tools, the study able to extract important facts about word patterns job skill requirement, and IT skills wanted by employers. The findings of this study have offset the limitation of the same study [18], [37] which provides the needed skills information of a certain job. Furthermore, the results provide current job skill requirements which are important in the revision and enhancement of the BSIT curriculum.

CONCLUSION
This study proposes a methodology for identifying and analyzing published job skills and IT job using frequency word occurrences of word skills as a requirement of the job. This proposed methodology is innovative in identifying required skills for a certain IT job that is posted online. Applying automated techniques, the proposed method will be able to retrieve and process large amounts of data posted online, and analyzed information about skills qualifications for a certain job. In addition, it provides direct, actionable information about skills demand that can be useful in planning and developing program curriculum. Thus, the results of the study also help the educational institution to understand the relationship between the posted job and the required skills/knowledge that need to be incorporated into their curriculum. It is therefore advisable, to identify demands in greater detail, and bridges the gap between skills needs and supply with more flexible program curriculum. Furthermore, Job and skill demand analysis is pertinent to the modern, data-and technology-dependent world, where skills and capabilities in a variety of industry sectors must be updated to cope with this new, invaluable source of knowledge. The future directions of the research study are to further explore other text mining tools and other visualization tools. There are many available tools and applications that can be tested for its information retrieval capabilities specifically in the area skill words and skill word patterns recognition, searching other potentially useful sources of data like web-based repositories such as online forums, blogs, and bulletin boards.