A highly scalable CF recommendation system using ontology and SVD-based incremental approach

ABSTRACT


INTRODUCTION
A recommendation system is an intelligent system that captures user behavior on internet portals in order to forecast user interest in future online product purchases, movie viewing, or music listening.Because of the growing popularity of the internet market, several websites now provide numerous choices to their customers.Looking for the appropriate item in which the customer is interested among thousands of items in a short amount of time has become quite difficult.Thus, a recommendation system has been introduced to meet this challenging issue.A recommendation system may create data based on a user's previous purchase history, search patterns, and online behavior.When a new item or user enters the online website, the recommender system search for the existing data and recommends the best item based on the customer's preferences [1].
Recommendation systems, based on their functioning behavior, are categorized into three types of recommendation systems to create the most efficient suggestion: content-based recommendation systems, collaborative recommendation systems, and hybrid recommendation systems.Approaches such as collaborative filtering (CF) and content-based filtering (CBF) have mainly been developed to acquire insight into user preferences [2], [3].Those preferences suggested in the CBF technique depend on their content Bulletin of Electr Eng & Inf ISSN: 2302-9285  A highly scalable CF recommendation system using ontology and SVD-based … (Sajida Mhammedi) 3769 similarity to items previously scored by the user.While the CF technique takes advantage of the similarity of users' tastes for suggestions [4], [5].The CF technique is classified into two types: user-based and item-based.The similarity between users in the user-based CF method is estimated according to co-rated items.In contrast, item-based CF assesses this similarity among items instead of users.Items that users have previously admired will pique their attention.Compared to different techniques, the hybrid strategy combines the two filtering procedures and has superior prediction accuracy [6].
Even though CF has drawn attention owing to its effectiveness and simplicity [4], it still faces the following challenges: data sparsity [7], computation time, accuracy of recommendations, scalability, and data volume.To address the issues presented by the CF method, the hybrid recommendation strategy employs several information filtering techniques.The hybrid filtering approach is designed to produce more efficient and accurate recommendations than a single technique.In addition, the hybrid model overcomes the disadvantages of a single system by combining many techniques.In this research, we offer a recommendation approach that uses both ontological semantic filtering and an incremental algorithm to give high scalability when dealing with massive increases in user and item matrix sizes as well as sparsity issues.In order to have an accurate prediction with a decreased running time of the recommendations system.The rest of the paper is organized as: section 2 examines the relevant works to this research work, the proposed approach is described in section 3, section 4 discusses the practical implementation and evaluation of the proposed approach, and followed by the final section 5 which concludes the paper.

RELATED WORK
Several strategies for recommendation systems have been established in prior research.Nowadays, recommender systems are essential to speed up internet users' searches for relevant content.In the area of recommender systems, using ontologies as a knowledge base is becoming increasingly popular in modeling tasks, inferring new knowledge [8], [9] or computing similarity for recommender systems.Adopting ontologies in information systems intends to model information at the semantic level by structuring and organizing a set of hierarchical terms or concepts within a domain and modeling the relationships between these sets of terms or concepts using a relational descriptor [10], [11].Recommender systems based on knowledge represented by ontologies are then proposed by explicitly soliciting user requirements for these elements and an in-depth understanding of the underlying domain for similarity measures and prediction computation.In relation to the significant number of published studies [11], enhance user profile representation by implementing an ontologybased recommendation system.By introducing domain ontologies into the system, the suggested technique is able to uncover relationships between users and their favorite choices regarding items.The authors developed several experiments based on offline tests.They also compared the new recommendation approach to collaborative approaches.To improve the quality of the recommendations, Hassan et al. [12] used item semantic knowledge.As a result, they created a hybrid semantic improved recommendation strategy that combines the inferential ontologybased semantic similarity (IOBSS) with the classic item-based CF method.Kermany and Alizadeh [13] suggested multi-criteria recommender systems using adaptive neuro-fuzzy inference system (ANFIS) relies on ontological item-based and user demographic information.Their method was tested using Yahoo movies platform dataset.Moreover, according to their results, the accuracy of multi-criteria recommendation system can be increased by incorporating semantic information.
Dimensionality reduction techniques have been widely used in the literature of recommendation systems.Among the most successful is the dimension reduction method called singular value decomposition (SVD) and its variants and principal component analysis (PCA) [14], [15].These techniques are used to reduce the dimensionality of the data, which helps in handling the sparsity of the data and improving the efficiency and accuracy of the recommendation process.The literature shows that these techniques have been successful in improving the recommendation performance on various datasets, especially after the challenge launched by Netflix.Indeed, many works like [16] analyzing the results of the challenge demonstrated the superiority in terms of accuracy of approaches after applying dimensionality reduction techniques over CF algorithms.Recent research related to our works has also used SVD as a technique in CF for recommendation systems.For instance, Wang et al. [17] proposed a CF algorithm that incorporates trust between users to improve recommendation accuracy.The algorithm combines the traditional SVD method with a trust factor matrix, the results show that it outperforms other state-of-the-art CF methods in terms of recommendation accuracy.Nilashi et al. [18] combine CF with ontology-based techniques and dimensionality reduction.The proposed recommender system uses ontology and dimensionality reduction techniques to improve the accuracy and coverage of CF.It combines semantic similarity and matrix factorization to handle the sparsity problem and provide more personalized recommendations.However, the use of incremental SVD has been proposed as a way to improve scalability and performance compared to non-incremental SVD [19].Brand [20] uses an incremental SVD approach with incomplete data to solve the issue of uncertain new data with missing values and/or affected by correlated noise.In comparison to SVD technique, using incremental SVD in recommendation systems will updates the factorization model using only new information instead of recomputing the entire model from scratch, which can be computationally expensive and time-consuming for large datasets.As a result, incremental SVD can reduce the training time and improve the efficiency of the recommendation system without sacrificing accuracy.Overall, our contribution lies in presenting a comprehensive recommendation method that combines dimensionality reduction, ontology-based techniques, and incremental SVD to address key challenges in recommendation systems.By leveraging these techniques, we aim to improve recommendation accuracy, scalability, and efficiency which will ultimately enhancing the user experience and satisfaction.

HYBRID RECOMMENDER SYSTEM PROPOSITION
Figure 1 shows the diagram illustrating how the proposed recommendation system works.The suggested recommendation system aims to provide efficient, scalable, and accurate recommendations.Two significant aspects to examine in the suggested system process.In the first phase, several tasks are performed during the construction of the recommendation model, such as clustering of items and users based on rating, dimensionality reduction using the SVD algorithm, and constructing item-user similarity matrices.First, the system is supplied with a user-item matrix that specifies the user's rating given to each item.As a result, item clusters must be constructed using fuzzy c means clustering to determine the similarity between items.The pairwise similarity between them is computed to regroup items based on similarity.The overall similarity is obtained by calculating the item-based and ontology-based similarity averages.

Figure 1. Proposed system framework using ontology and incremental SVD
A new algorithm based on ontologies is suggested to compute the item similarity.Following that, we created decomposition matrices using SVD for the user-item cluster.It is worth mentioning that we are working on the SVD model for items and users.As a result, in each matrix, similarity computation is correctly performed after the matrix decomposition process.After the comparable item clusters have been produced, it is proposed to predict a rating for the current user who has yet to rate every item in the system to eliminate sparsity in the user-item matrix.The incremental SVD is employed as part of the recommendation process' second phase (online phase) to predict and recommend tasks for targeted users and items.We follow the same procedure as the item-based suggestion.Finally, in a meaningful way, integrate user-and item-based predictions.In the following subsections, the approach is discussed in depth.

Preprocessing of data
The initial step in our research is to preprocess the dataset in order to make it suitable for the proposed method.This involves conducting the necessary preparation processes that real-life data typically require for analysis.In our approach, we begin by transforming movie ratings into a user-item matrix, often referred to as a rating utility matrix.This matrix captures the ratings provided by users for different movies in Figure 2.However, this matrix is typically sparse, meaning many cells are empty as they represent movies the user has not rated.CF algorithms typically work with dense matrices, so we need to convert the sparse matrix into a dense matrix by applying normalization techniques.The empty cells in the matrix correspond to new users, new movies, or movies not rated by anyone.Users who have expressed positive sentiment (indicating user preference) towards a movie are assigned ratings of 4 or 5, while users who have shown negative sentiment (indicating user disinterest) are assigned ratings of 1 or 2. Therefore, to address item and user bias in the ratings, we normalize the ratings using mean normalization.

Movie ontology
In this research, we use the movie ontology (MO) created based on the ontology web language (OWL) standard and at the Department of Informatics in the University of Zurich [21].MO elucidates the semantic ideas and concepts related to the domains of the films.The class "movie" is the main class and all movies are considered instances of it.Many research have demonstrated that using an ontology-based semantic approach improves the prediction accuracy of recommendation systems [22], [23].

User-based clustering
In user clustering, users are grouped based on similar preferences, as determined by their ratings.After clustering the users, each cluster's views aggregate is utilized to predict unidentified ratings for target users or predict which items they like or dislike.Since clusters contain a restricted number of users, there is no need to evaluate all users.Thus, it results in improving performance.

Item-based clustering 3.4.1. Compute ontology-based item similarity
Ontologies supply immense knowledge on any topic, which might be highly valuable in the recommendation system [24].Most studies ignored ontologies' multilevel and complicated structures and used just one feature to determine item similarity according to ontology.For instance, several researchers have relied only on a movie's "genre" to identify a related collection of films based on ontology.In the context of a movie recommendation system, let's consider Figure 3.In this example, we can assume that CL represents the movie class.Within this movie class, we have two attributes: At1 and At2, which could represent characteristics such as the release date and copyright information.Additionally, we introduce a subclass called SCl, which represents the "movie origin".This subclass includes attributes At3, At4, and At5, which correspond to specific regions such as North Africa, Asia, and Europe.By organizing the movie data in this hierarchical manner, we can capture more detailed information about movies and their origins.This ontology-based approach allows us to categorize and represent movies based on their attributes, enabling more sophisticated recommendation algorithms to provide personalized movie suggestions to users.This work uses the binary Jaccard similarity coefficient to compute item-based semantic similarity.For two items to be similar, their attributes and the attributes of their subclasses must be similar [25].The average of the values is determined using recursive computing to find the similarity between items until the maximum depth defined at the beginning is reached.As a result, in Assuming no attribute in the ontology is a subclass, in ( 2) is the ontology-based similarity between two items    .In (1) is the semantic similarity between classes   and   of two items  and   for a specific attribute , the total number of attributes is represented by .In (3) computed if attributes, subclass with its attributes exist in the ontology.To determine the ontology-similarity between two objects specified by the domain's common.It requires the following two inputs: i) ontology of items with classes, properties, and relationships; and ii) I represent the set of all items.The semantic similarity matrix (SSM) is computed, which measures semantic similarity between two items based on ontology.

Calculation of item similarity using explicit user ratings
The similarity between items is determined based on explicit ratings supplied by users in the user-item rating matrix, where I represent the set of items, U represents the set of users, and Rui represents the score given by user u to item I, as seen in Figure 4.The similarity metric that was used in (4): where   and   represent the values of the ratings given by user u to item   and item   , respectively.1≤u≤l, both items were rated by the total number L of users.

Total item similarity score
The total similarity between items is obtained by combining the similarity score supplied by ontology in ( 2) and (3), and explicit user ratings in (4).

Bulletin of Electr
where α + µ = 1, the total item similarity matrix (TISM) is generated after calculating the overall similarity for each item in the item set using (5).

Method of item clustering
Fuzzy c means clustering [26] was employed in this study to group similar items since it works well with sparse datasets in the majority of recommendation system.This study considers content-based characteristics derived from ontologies combined with user rating data to avoid over generalization, poor accuracy, and cluster overlapping that will result from using just one.As detailed in the next section, similar items within a cluster are used to predict the target item's score.As a result, the number of items that need to be evaluated is significantly fewer than the entire number of items in the system, which increases the system's performance [22].Once the clusters have been constructed, a user-item cluster matrix (UICM) is produced, in which U represents the set of users, C represents the centers of all item's clusters, and   represents the value of the average rating supplied by user m to the item of the cluster center z, as illustrated in Figure 5.

Prediction for the rating
Based on the cluster generated, a sorted list of top T similar items is produced for a target item.Using the obtained values, the empty cells in the user-item rating matrix for the target user are then filled.The rating for each unrated (target) item is anticipated based on the active user's ratings for items comparable to that unrated (target) item.Based on (6), we can predict what the rating of an unrated item i will be expected from a target user u.
Where the similarity score between target item i and item j is Similarity (i, j);  , is the rating for similar item j by user u, and T is the total number of similar items considered.In certain cases, the current user may not rate the top T similar items for a target item, leaving some empty cells in the user-item matrix after filling it.To address this issue, an extended technique might be used to estimate the remaining sparse cells.In this method, an active user's rating behavior for other items is taken into account, as well as other users' ratings for the unrated (target) item.Using the suggested method, an unrated item I may be predicted by target user u as (7): Where α and µ are control parameters, M is a measure of how many other items U (target user) scored, 1≤m≤M,  , is the rating provided by u to other items M, n is the number of other users, where 1≤p≤n, q≠u is the number of other users who submitted a rating for unrated target item I, and  , is the rating given to target item i by other users p, excluding target user u.The result for predicting the unrated value in an explicit user-item rating matrix UIM(U, I) is a dense, non-sparse user-item rating matrix DUIM(U, I).

. Singular value decomposition
According to Zhou et al. [19], one of the standard solutions for sparsity issues is to use data dimension reduction techniques, notably SVD, which is a matrix factorization technique that can extract dataset features by dividing the original user-item rating matrix into three smaller matrix multiplications.Given a mxn matrix A ∈   ( is the number of items and  is the number of users), the SVD() is expressed with the rank()= as: () =  ×  ×   ,where U ∈   , V ∈   , and  ∈   .The middle matrix  is a diagonal matrix with  nonzero entries, which are the singular values of A. SVD is the best low-rank linear approximation of the original matrix, which provide the optimum approximation of the utility matrix A.

Incremental singular value decomposition algorithm in the prediction task
The algorithms in the proposed study operate in two stages, online and offline.In the suggested CF recommendation system, user-to-user mapping takes place offline.In contrast, the actual rating prediction or target user interactions is made online.Offline prediction or recommendation is, in fact, a time-consuming procedure.Whereas the online method is efficient in terms of prediction and recommendation time owing to the usage of the incremental SVD.The parallel design system for the similarity formation method may be made incredibly scalable using SVD size reduction techniques while generating more significant results in maximum instances.This study presents incremental SVD algorithms that produce recommendations online for target users in the shortest time possible.The incremental algorithm's most essential quality is that it supports a high number of users, making the system scalable as the size of the user-item matrix grows.
Our recommender system operates in two distinct phases.First, the model is developed offline by calculating user-user or item-item similarity.Meanwhile, the model generates predictions when a newcomer or item is introduced, and the online process begins.In incremental SVD, the projection method is known as folding-in.To fold new users into the distance of the previously decreased user-item matrix.For instance, Figure 6 shows that after running the SVD method on A1 in the offline process with three matrices U1, Σ1, and V1, the online process uses the incremental approach whenever a new matrix A2 is added, resulting in three updated matrices U2, Σ2, and V2.

Dataset description
MovieLens dataset, which can be found at [27], is one of the most well-known datasets for evaluating recommender systems.The MovieLens dataset consists of 1 M ratings provided by 6,040 users for a total of 3,900 movies.Each rating is expressed on a scale of 1 to 5, where a rating of 1 indicates the least liked movie and a rating of 5 represents the most liked movie.The dataset offers a comprehensive collection of user reviews, allowing us to evaluate and enhance our recommendation system based on a broad range of user preferences and movie ratings.Detailed information about the dataset is presented in the Table 1.3775 WebSPHINX [28], a web crawler, was used in this study to collect material relevant to IMDb [29] items.Furthermore, gathered data is used to construct and complete an item ontology.To conduct tests, the dataset was divided into 80% of randomly selected data for the training set, while 20% of the remaining data was used for the testing set.

Evaluation and discussion of the proposed system
The recommender system presented in this study was implemented using Python 3.9.7 on a PC with a 4 GHz processor, 8 GB RAM, and 64-bit Microsoft Windows 10.To thoroughly assess the system's performance, it was compared to various related approaches, including Pearson nearest neighbor algorithm, item-based CF with EM, SVD combined with ontology, and user-item-based EM and SVD with and without ontology integration.The evaluation was conducted from two perspectives: time throughput (recommendations per second) and accuracy, providing valuable insights into the system's efficiency and effectiveness compared to existing approaches.

Evaluation 1: predictive accuracy analysis
Mean absolute error (MAE) is a statistical accuracy metric used to evaluate prediction accuracy.In this experiment, the MAE computes the difference between the predicted and actual ratings.MAE is presented in (8): Where  determines the number of items on which a user u has given a score, the suggested approach for predicting accuracy using MAE is assessed and compared to the state-of-the-art methods.Displayed on MovieLens datasets, respectively, against different neighborhood sizes Figure 7.

Evaluation 2: decision-support accuracy
In terms of accuracy measurements, the decision-support metrics will be crucial in evaluating the overall performance of the hybrid-based recommender.In the information retrieval area, several measures for this aim are well-known.Recall, precision, and F-measure are among the metrics included in this category.The precision computes the fraction of relevant items in the list of returned results.In contrast, the recall calculates the fraction of pertinent items that have been retrieved.Both metrics should be used in common since the recall increases as the number of items retrieved increases, whereas the precision often decreases as result sizes increase.The F measure is a metric that takes both values into account, as indicated in (11): (11) The F1 measures and precision values for all methods on various top-N recommendations are shown in Table 2.It can be deduced from the table that the precision achieved by the suggested technique is significantly higher than that obtained by the nearest neighbor algorithm or the other methods tested.In addition, we found that the F1 measures of the proposed method, dealing with dimensionality reduction using incremental SVD and ontology, outperformed.Compared to other methods, these findings are sufficient to support our claim that our recommendation system is reasonably more efficient and scalable.

Evaluation 3: scalability analysis
The efficiency of the suggested approach is evaluated in the first experiment.Evaluation is based on throughput, known as the number of suggestions per second.We test our strategy on the MovieLens datasets to demonstrate its effectiveness in improving the system's scalability problem. Figure 8 illustrates the performance results of our method compared to the state-of-the-art methods.
According to the graph, the throughput of those methods that use dimensionality reduction techniques and clustering is considerably higher than other methods.Moreover, the proposed approach based on clustering with expectation maximization (EM), ontology similarity, and incremental SVD is slightly higher than other methods, especially those that rely on the SVD reduction technique.Unlike systems that use the nearest neighbor technique, clustering allows the recommendation system to analyze just a part of the items/users.As a result, increasing the cluster size does not affect throughput since it must scan all nearest neighbors.A highly scalable CF recommendation system using ontology and SVD-based … (Sajida Mhammedi)

3777
The results of the evaluations demonstrate the effectiveness and superiority of the proposed recommendation system.By incorporating ontology and dimensionality reduction techniques in CF, the system achieves improved predictive accuracy, decision-support accuracy, and scalability compared to existing methods.It proved that considering semantic relationships and reducing dimensionality enhances the system's ability to capture user preferences, provide accurate ratings, and enable the system to handle large-scale datasets more effectively.Therefore, this implies that the proposed method not only provides accurate recommendations but also ensures that relevant items are retrieved.The system addresses the limitations of traditional CF approaches by providing accurate recommendations, assisting users in decision-making, and efficiently handling large datasets.This study's findings highlight the proposed system's potential for practical applications in the recommendation domain.

CONCLUSION
In this paper, we have presented a novel recommendation method that addresses the challenges of accuracy, scalability, and sparsity in CF-based recommender systems.Our approach incorporates dimensionality reduction using the incremental SVD algorithm, ontological item-based semantic similarity, and explicit user ratings to improve the prediction accuracy and scalability of the system.By adopting the incremental SVD method, we were able to handle the increasing size of the user-item matrix while maintaining computational efficiency.The folding-in technique employed in the incremental SVD algorithm significantly reduced the computation cost and allowed our system to achieve high scalability.The experimental results conducted on a real-world movie recommendation dataset confirmed the effectiveness of our proposed method.The precision, F1 measures, and MAE metrics demonstrated that our system provides accurate predictions while effectively addressing the sparsity and scalability issues commonly encountered in recommender systems.The incorporation of MO further improved the predictive accuracy and expanded the potential for applying our method to different semantic contexts and domains.Further research can explore additional evaluation metrics, investigate the system's performance with different datasets, and consider the impact of incorporating other factors such as user demographics or temporal dynamics.By continuously refining and enhancing the recommendation system, we can further improve the accuracy, relevance, and usability of the recommendations provided to users in various domains.

Figure 5 .
Figure 5. Construction of UICM from user-item rating matrix

Figure 6 .
Figure 6.Phases of recommendation process

Table 1 .
Description of the dataset A highly scalable CF recommendation system using ontology and SVD-based … (SajidaMhammedi)

Table 2 .
Comparison of F1 metric and the precision values for different methods