Improving sentiment analysis using text network features within different machine learning algorithms

ABSTRACT


INTRODUCTION
Sentiment analysis (SA), also called opinion mining is one of the most fundamental tasks in natural language processing (NLP) that deals with unstructured text and classifies it as expressing either a positive, negative, or neutral sentiment [1], [2].SA has become an important tool for decision-makers and business executives, as well as for the general public, to grasp sentiments and attitudes.Because users are increasingly contacting one another before making purchasing decisions, decision-makers and corporate leaders are now investing heavily in assessing public opinion about their products and services [3].They invest in SA not only to keep their consumers happy but also to develop new products, services and attract new customers.In politics, it can be used to infer popular attitudes and reactions to political events, allowing better judgments to be made.This fact pushes the NLP community to devote more resources to SA research [4].
Researchers have recently presented many ways for automatically classifying opinionated texts as positive, negative or neutral.Essentially, there are two main approaches, the first is utilizing machine learning (ML) algorithms, which are presented in this paper, and the second is utilizing lexicon based (LB) approach works with the understanding that contextual sentiment orientation is the sum of the opinion orientation of every available and accessible word or phrase [5].However, some obstacles exist, such as spam and fraud, domain dependency, negative, NLP overhead, bipolar terms, and a large lexicon [6].
Sentiment analysis is commonly used to measure overall contextual polarity or writer sentiment about a certain issue and to gain business insight [7].The challenge in sentiment classification is that sentiment can refer to a person's judgment, sentiment, or appraisal of an object such as a movie, book, or product, as well as a positive or negative text, phrase, or function.However, due to the variety of diverse sites, it is difficult to locate and track opinion websites, as well as refine information from them [8].It is critical to solve the challenges listed above to make the data mining process more successful and efficient.Previously, researchers did studies on sentiment analysis and its difficulties.Because of the importance and impact on any society, we chose sentiment analysis of social content utilizing the proposed framework.Recent advances in sentiment analysis methodologies based on machine learning and deep learning have significantly enhanced the performance of business intelligence as well as scientific and academic applications [9].
Sentiment analysis has been approached in a variety of ways.In general, these approaches have relied on either supervised or unsupervised machine learning techniques.According to Hammad and Al-awadi [10] four automatic classification techniques: support vector machine (SVM), back-propagation neural networks (BPNN), Naive Bayes (NB), and decision tree.The goal is to develop a lightweight sentiment analysis method for social media evaluations written in Arabic.The SVM classifier achieved the highest accuracy rate, according to the results.This work [11] used various supervised machine-learning algorithms to establish an arabic Jordanian twitter corpus for sentiment analysis.The experimental results show that the SVM classifier employing the term frequency-inverse document frequency (TF-IDF) weighting scheme stemming through the Bigrams attribute surpasses the Naive Bayesian classifier best scenario outcomes.The Arabic sentiment Twitter dataset for the Levantine dialect ArSenTD-LEV was presented by Baly et al. [12].They gathered 4,000 tweets and tagged them with the appropriate details: the overall sentiment of the tweet, the target audience to whom the sentiment was transmitted, how the sentiment was expressed, and the topic of the tweet.The findings support the significance of these annotations in increasing the performance of a baseline sentiment classifier.The textual Yelp evaluations of businesses are examined S. and Ramathmika [13] to provide a chance for the review to have positive or negative reviews.Machine learning techniques such as NB, multinomial Naive Bayes, logistic regression (LR), Bernoulli Naive Bayes, and linear support vector clustering were employed, and it was discovered that Naive Bayes performed the best, with an accuracy of around 79.12.While Liu [14] carried out a text ablation study to evaluate the performance of several deep learning and machine learning models.They found that fewer complex models, such as LR and SVM, are better at predicting sentiments than more complex models, such as gradient boosting, long short-term memory (LSTM), and bidirectional encoder representations from transformers (BERT), using the F1 score as a comparative metric.The sentiment analysis of products and customer reviews on social media and product websites was the main focus of this work [15].Five separate study datasets from benchmark data sources were used in this investigation.Choosing the right feature encoding techniques is essential for the quantitative representation of customer feedback throughout the classification and analysis phase, according to experiments.This embedding layer's importance for sentiment classification has been established.In this study, models for sentiment categorization and analysis based on recurrent neural networks and long short-term memory inspired by deep learning were applied.The Yelp dataset exhibited an accuracy of 83% when the final results were compared to the outcomes of earlier methods.Examining machine learning and deep learning models for predicting sentiment and rating from visitor reviews is the aim of the paper [16].This study employed machine learning models like NB, SVM, convolutional neural networks (CNN), LSTM, and bidirectional long short-term memory to extract sentiment and ratings from traveler reviews (BiLSTM).Deep learning models based on BiLSTM are more efficient and accurate than machine learning algorithms, according to the study's findings [17].The purpose of the project [18] is to analyze and forecast customer reviews from the Yelp website, and the initial data set was filtered to solely include insurance ratings.While all techniques, including decision tree, k-nearest neighbors (kNNs) classifier, SVM, LR, and random forest (RF) classifier, can accurately classify review text into sentiment classes, logistic regression surpasses in high accuracy with 93.770.
We chose sentiment analysis of a Yelp company dataset using the suggested framework because of its importance and impact on society.After understanding the significance of sentiment analysis, the method of this study will be useful in improving the sentiment analysis process in business content.The proposed methodology produces better, or at least comparable, outcomes with greater confidence and less computing complexity.

Bulletin of Electr Eng & Inf
ISSN: 2302-9285  Improving sentiment analysis using text network features within different … (Ali Mohamed Alnasrawi) 407

RESEARCH METHOD
Sentiment analysis presents unique challenges compared to traditional data mining, primarily due to the subtle distinctions between positive and negative sentiments or between neutral and positive sentiments [19].This article made the supposition that every review in our dataset was reliable.However, a rising corpus of research on potentially incorrect information is alerting users and service providers to the ongoing need to update and assess the variables that may influence how trustworthy and high-quality online information is perceived.Large-scale text processing is extremely challenging, therefore the reliable polarity detection of consumer reviews is still an active and fascinating research area.As a result, deriving precise meanings from textual data like consumer reviews, comments, tweets, blogs, and so on is difficult.
This paper introduces a sentiment analysis framework illustrated in Figure 1, which combines social network analysis and sentiment classification to handle the preprocessing and classification of business reviews in the Yelp dataset.The research primarily relies on social network analysis, where the text corpus is converted into a text network to extract features and relationships.While network analysis is typically used to depict interpersonal interactions, it can also express relationships between words.In this context, a corpus of texts can be seen as a network, where each node represents a document, and the connections between nodes indicate the frequency of word co-occurrence in documents [20].

Figure 1. Sentiment classification framework
The research encompasses several phases, starting from gathering raw data to determining the sentiment (positive, negative, or neutral) of reviews.For the "Restaurant" business in the Yelp dataset, the reviews are labeled as "Positive" if the restaurant's rating is above 3, "Negative" if it is below 3, and "Neutral" otherwise.Corpus preparation is a crucial step before conducting any analysis, it a process of cleaning up the text and preparing it for conversion to text networks and is an essential step to conduct before doing any analysis.Tokenization is the initial step in preprocessing, and it is one of several effective approaches for data preprocessing.Tokenization is the process of dividing a sentence into a list of words [21].Following tokenization, the next step is to remove stop words and digits.Stop words are terms that are used frequently in any language.Stop words in English include words like "is", "the", "and", "a" and so on.Because certain terms are unimportant in natural language processing, they are eliminated [22].Lemmatization is the process of converting a word into its root or lemma for example converting "swimming" to "swim", "was" to "be" and "mice" to "mouse" and so on.All words will be lowercase for easy comprehension because computers handle lower and upper case differently.Finally, all punctuation is removed, which helps to reduce bustle and eliminate of extra information.
A corpus of documents can be represented as a network once any unnecessary text has been removed, with words acting as the nodes and the edges indicating how frequently they occur together in a document.Because most papers share at least one word, text networks are frequently quite dense, or have a large number of edges.Therefore, because such thick networks are exceedingly crowded, visualizing text networks in the manner shown in Figure 2 presents inherent difficulties.
Automated text analysis is used to determine patterns of connections between words that aid in more precisely identifying their meaning after the text corpus has been represented as a text network.In network analysis, centrality measures are used to assess a node's importance or centrality.Finding the most influential people in a social media network, the articles that receive the most citations in a citation network, the most dangerous criminals in a crime network, and so on can all be done with the help of centrality calculations.Some examples of centrality metrics that are often used [23], [24]: − Closeness centrality: estimates the importance of a particular node by measuring how close it is to all other nodes in the text graph.Let dij be the length of the shortest path between nodes i and j [25]: − Betweenness centrality: the network's interconnections stream is utilized to score the nodes.The significance is demonstrated by the regular connectivity with numerous other nodes.Nodes with a high amount of betweenness are more likely to act as a connector for many groups of other key nodes.It is the total number of shortest paths between h and j that pass through node i: − Page rank centrality: similar to the eigenvector, but it calculates using a random walk through the graph, that is, it simulates someone randomly "surfing the web" for some time and scores each node depending on the number of times the surfer hits them.Where    out-degree nodes: − Harmonic closeness: is a variant of closeness centrality or inverts the sum and reciprocal operations in the definition of closeness centrality: − Hubs and authorities: a clear development of eigenvector centrality.A high authority actor receives from many good hubs, and a high hub actor refers to many good authorities.The hub score is proportional to the authority scores of the vertices on the outgoing ties and the authority score of a vertex is proportional to the sum of the hub scores of the vertices on the incoming links.These values are the singular vectors arising from the decomposition of a single value [25], [26].These centrality measures help analyze the connections between words and contribute to more accurate sentiment classification.In other words, all the features obtained from text network analysis based on centrality measurements are augmented with the original Yelp reviews after normalizing them and finding a new dataset with about ten features.Table 1 shows samples of features.A few measures of centrality were used to obtain the features that will be used as inputs for the machine learning algorithms.However, the algorithms achieved high-performance metrics.In the sense that the use of many attributes does not necessarily lead to an increase in the efficiency of the algorithms.The classification in the proposed model is achieved by using the following algorithms: i) kNNs, ii) decision tree, iii) SVM, iv) stochastic gradient descent, v) RF, vi) neural network, vii) NB, viii) LR, ix) gradient boosting, and x) AdaBoost.We employed the 10-fold cross-validation methodology with shuffled sampling to evaluate several possibilities.This method creates a random subset of the test set and computes the accuracy, precision, recall, area under the curve (AUC), and F1-score for each possibility.

RESULTS AND DISCUSSION
The discussion of the experimental results and discussion about the proposed framework is presented in this section.Different machine learning techniques, including neural networks, decision trees, SVM, and many more classifiers, are applied to the chosen Yelp dataset in order to assess it.Our research shows that on Yelp, where customer evaluations are unbalanced, 68.36% of users have positive reviews, 20.41% have negative reviews, and only 11.23% have neutral opinions.The evaluation of the Yelp reviews after converting it to a text network and classifying the sentiment into three classes such as, positive, negative, and neutral through different classifiers.We employed the most standard performance measures to assess the sentiment analysis system's performance, which are defined as follows: AUC, accuracy (AC), precision, recall, and F-score.
Where true positive (TP) is the number of correctly expected positive sentences as positive, false positive (FP) is the number of wrongly forecasted negative statements as positive, true negative (TN) is the number of correctly anticipated negative sentences as negative, and false negative (FN) is the number of correctly predicted positive sentences as negative.It should be observed that the positive class prediction is more precise the greater the precision.A high recall shows that many sentences from the same class have been successfully identified, whereas accuracy merely reflects the proportion of correctly classified sentences, regardless of class.The F1 score is a weighted average of recall and precision.Furthermore, the receiver operating characteristic (ROC) can provide a comprehensive evaluation of classifier performance: P(x/c) represents the conditional probability that a data entry bears the class label c.The categorization outcomes are graphed using a ROC curve, from most positive to least positive.The most typical statistic for model evaluation is the AUC.It is used to solve general classification-related issues.The whole two-dimensional region accessible under the entire ROC curve will be determined by AUC.The classification evaluation cannot be measured in a single experiment.Cross-validation is a useful way to ensure the performance of a classifier.Cross-validation involves running multiple tests, with the average of all performance measurements serving as the final and authentic performance metric.The performance of the algorithms as shown in Table 2 varied across the metrics evaluated.Among the methods, neural network achieved the highest AUC of 0.830378, indicating its effectiveness in distinguishing sentiment.It also demonstrated the highest classification accuracy (CA) of 0.7773, implying that it correctly classified around 77.73% of the instances.SVM, stochastic gradient, and LR also performed well, with AUC values above 0.8.These algorithms exhibited relatively high precision and recall values, suggesting their ability to accurately predict positive and negative sentiments.On the other hand, AdaBoost, and decision tree showed lower performance compared to other methods, with AUC values of 0.655272 and 0.670154, respectively.Overall, the results highlight the importance of selecting appropriate machine learning algorithms for sentiment analysis, with neural network being the most effective in this study.These findings contribute to the understanding of the performance variations among different algorithms and guide future research in improving sentiment classification tasks.Figure 3 shows that the performance of all the features derived from centrality measurements after analysis of the text network is higher with the neural network, gradient boosting, and LR classifiers.Thus we concluded that the neural network has a high prediction about the AUC, accuracy, and F1 score than other techniques.Also, the figure presents that Among the machine learning algorithms evaluated, the least performing method in terms of AUC is AdaBoost, with an AUC value of 0.655272.This indicates that AdaBoost had the lowest overall performance in distinguishing sentiment compared to the other algorithms.The lower performance of AdaBoost could be attributed to several factors.AdaBoost is an ensemble learning method that combines weak classifiers to create a strong classifier.However, in the context of sentiment analysis, it may not have been able to effectively capture the complex relationships and patterns present in the textual data.AdaBoost relies on the iterative reweighting of instances to focus on misclassified examples, and it may have struggled to handle the nuances and variations in sentiment expressed in the dataset.Furthermore, AdaBoost's performance might have been affected by the characteristics of the dataset itself.If the dataset had imbalanced classes, with significantly more instances of one sentiment class than the others, it could have impacted AdaBoost's ability to learn and generalize effectively.It is important to note that the performance of machine learning algorithms can vary depending on the specific dataset and the nature of the sentiment analysis task.While AdaBoost demonstrated lower performance in this study, it is still a valuable algorithm that may perform well in other contexts or with different datasets.From Figure 4

CONCLUSION
The objective of this study was to introduce a sentiment classification approach that leverages topological information extracted from text networks, addressing the limitations of existing models.Sentiment analysis plays a crucial role in making informed decisions by determining the intensity of sentiment in textual sources.Using Yelp's main dataset, we applied a combination of machine learning techniques and social network analysis to distinguish between phrases and product reviews.The results indicate that our proposed approach, particularly when combined with neural network, LR, and gradient boosting methods, yields the highest quality sentiment analysis outcomes.
In future research, we aim to extend our categorization technique to classify other domains such as social and marketing contexts.Additionally, we plan to enhance sentiment classification through the enrichment of sentiment lexicons, the development of specialized dictionaries, and the creation of diverse text collections covering various topics and facets in the Arabic language.Moreover, we believe that incorporating deep learning models into the analytical system holds potential for further improving classification accuracy.
In conclusion, this work presents a novel sentiment classification approach that effectively extracts and analyzes topological information from text networks.The findings highlight the successful combination of machine learning techniques and social network analysis in sentiment analysis tasks.The implications of this research extend to various domains and future advancements in sentiment classification can be achieved through the proposed enhancements and the integration of deep learning models.

−
ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 405-412 408 Eigenvector centrality: the degree to which neighboring nodes are linked to one another also indicates importance.Where a is the adjacency matrix, and G is the text graph:

Figure 2 .
Figure 2. Text network visualization of Yelp reviews

Figure 3 .
Figure 3.Comparison of performance measure matrices the ranges of ROC analysis for ISSN: 2302-9285  Improving sentiment analysis using text network features within different … (Ali Mohamed Alnasrawi) 411 each classifier are observed, we can say that the neural network, gradient boosting, and LR have comparatively better results as compared with NB, AdaBoost, kNNs, and decision tree.

Figure 4 .
Figure 4. Comparison of ROC score analysis

Table 1 .
Samples of augmented dataset using centrality measurements Improving sentiment analysis using text network features within different … (Ali Mohamed Alnasrawi) 409

Table 2 .
Performance comparison of different machine learning algorithms