Malaysian views on COVID-19 vaccination program: a sentiment analysis study using Twitter

ABSTRACT


INTRODUCTION
In December 2019, China was hit by a sudden outbreak of COVID-19 caused by the SARS-CoV-2 virus.The World Health Organization (WHO) labeled it a pandemic due to its severe and widespread nature, which can lead to severe pneumonia, respiratory failure, and death.The entire world, including Malaysia, has been affected by this pandemic.To combat the spread of COVID-19, a global effort has been made to develop and test vaccines.Furthermore, one of the best methods for lowering the prevalence of infectious diseases is vaccination.Numerous vaccines have been developed and approved in a short time frame to counter the pandemic.One example is the Pfizer/BioNTech vaccine, which was the first to be approved for widespread use in the United Kingdom on December 2, 2020, less than a year after the pandemic was declared.However, a sizable portion of people express hesitation and even hostility toward vaccination [1], [2].This hesitation mainly comes from the acceptance of public concern about the vaccination in terms of its: i) health risk, ii) cultural acceptance, iii) religious acceptance, iv) economic growth, and v) political stand [3].This hesitation and reluctance has led to the way individuals perceive the risk of getting infected, as well as how they view the gravity of the infection which in turn leads to a low acceptance rate of the vaccine [4], [5].The reluctance to get vaccinated could have a significant and far-reaching impact on the acceptance of COVID-19 vaccines by people in the community as it poses a threat not only to the hesitant individual but to the entire ISSN: 2302-9285  Malaysian views on COVID-19 vaccination program: a sentiment … (Mohamed Imran Mohamed Ariff) 437 community.Delays and rejections would make it impossible for communities to reach the required level of vaccine uptake necessary for herd immunity to be achieved [6].Currently, the focus is on developing a vaccine to protect the population from COVID-19, but it is important for stakeholders to be prepared for the next challenge, which is ensuring the vaccine is accessible and accepted by the public.

LITERATURE REVIEW
Over the last few years, as the COVID-19 pandemic spread globally, the COVID-19 vaccine-related issues have received increased public attention, especially relating to the public hesitation to be vaccinated.The COVID-19 pandemic has raised public concern about vaccine hesitancy, which can be broken down into three main reasons: i) evaluating the risks and benefits of vaccines, ii) lack of knowledge and awareness, and iii) influence of religious, cultural, gender, and or socio-economic factors [7].This hesitancy is a result of poor health literacy thus leading to a low acceptance of the COVID-19 vaccines [8], [9].Another major reason leading to the limited uptake of vaccines is the impact of social media, especially the usage of Twitter [10].An extensive literature review has shown that social media, particularly Twitter is an excellent channel for expressing emotions, perspectives, and viewpoints [11].
Furthermore, Twitter is a social media platform where people can openly share their opinions.Twitter provides a place for individuals to honestly communicate their ideas in real-time, with over 100 million active users and up to 500 million tweets generated everyday [12].Twitter is also a useful tool for evaluating the true public mood since users may express themselves freely and at ease, in contrast to traditional face-to-face interviews.Additionally, data collection for studies including opinion analysis is facilitated by Twitter's application programming interface (API) and open database access [13].Previous research has shown that the public's regular use of Twitter during COVID-19 boosted health awareness [14] and the execution of appropriate health safety measures during the pandemic.As a result, several government organisations have started using tweets to manage crises and deliver real-time updates [15], [16].Additionally, prior research has shown that these tweets' messages (such as opinions) can help the responsible authority get a high-level grasp of the actual situation, particularly during the COVID-19 pandemic [17].Since these beliefs and related ideas such as sentiments, attitudes, and emotions are fundamental to human activity, applying sentiment analysis to analyse tweets could show how the general public feels about the COVID-19 vaccination [14], [15].The process of analysing the opinions, feelings, and sentiments represented in words or sentences is known as sentiment analysis, sometimes known as opinion mining [18], [19].Sentiment analysis has grown in favour in the medical industry as a useful method for determining peoples' views toward vaccinations, immunisation, and public health in general [13].
According to by Hussein et al. [20], sentiment analysis of tweets about the COVID-19 vaccination could be a valuable tool for policymakers and governments as it enables them to keep track of public opinion and make informed decisions.According to by Rosis et al. [21], by stating important measures against COVID-19, such as getting vaccinated, wearing masks, practicing social distancing, and maintaining personal hygiene, has greatly contributed to controlling the spread of the virus.Twitter, as one of the prominent social media platforms, plays a significant role in raising awareness of these crucial measures.Public perception and attitude towards the pandemic are crucial in developing effective strategies to combat it [21], [22].In this regard, the analysis of social media provides valuable information for health professionals and government officials in their decision-making processes.Based on the paragraphs above, this study aims to gain deeper insights into what people are thinking and feeling regarding COVID-19 vaccination, by examining tweets.Furthermore, this study collects tweets using keywords related to vaccines and health concerns post-vaccination to gain insight into public perception and assist policymakers in planning the vaccination effort and health measures.By analyzing the Twitter data, healthcare professionals and policymakers can gain understanding of how the public is reacting to the COVID-19 vaccine during the pandemic.The study also hopes to shed light on people's views on health guidelines for COVID-19 prevention after receiving the vaccine.

METHOD
In order to attain the goal of the study, the machine learning life cycle (MLLC) method was selected.This technique is a highly effective method with broad applications and has been shown to produce results with superior accuracy when compared to those that involve human intervention, which tend to have a lower accuracy rate [23], [24].The MLLC method consists of seven steps, including: i) data collection, ii) data preparation, iii) data cleaning, iv) data analysis, v) model training, vi) model testing, and vii) implementation.These steps will be discussed briefly in the following sub-sections.

Data gathering
Data gathering is the first stage of the machine learning life cycle, tries to identify and gather data related problems.Identifying different data sources, such as files, databases, the internet, and mobile devices, is part of this process.The hashtags "COVID-19Vaccine", "#AstraZeneca", "#Sinovac", "#Pfizer", and "VaccineSideEffect" were used to collect data for this study on the Twitter platform.An open-source tool called StartBot was used to gather information going back to 2002.In order to create a cohesive dataset that will be used in the following stages, the process entails identifying many data sources, gathering data, and integrating data from different sources.

Data preparation
After collecting data, the next step is to plan for the following stages, which includes data preparation.This process involves organizing the data in an appropriate location and preparing it for use in machine learning training.This includes randomly selecting the data's ordering, extracting tweet details such as the hashtag, username, user handle, date of postings, tweets, retweet counts and like counts, and saving it in an excel file for faster access to the project.Data exploration and data pre-processing are two procedures that fall under this category.Data exploration is used to understand the type of data being dealt with, identifying features, format, and quality of the data.This step helps in identifying correlations, general patterns, and outliers in the data.The next phase in the data pre-processing process is data pre-processing for analysis.This dataset contains only tweets concerning the COVID-19 vaccine expressed in English.

Data cleaning
The act of cleaning and turning raw data into a usable format is known as data wrangling.It is the process of cleaning the data, selecting the variable to utilise, and changing the data into a suitable format for analysis in the following phase.It is one of the most crucial phases in the entire procedure.To overcome the quality concerns, data must be cleaned.It is not required that the data gathered must be of constant use to anyone, as part of the data may not be.Missing values, duplicate data, invalid data, and noise are all problems that might arise in real-world applications.As a result, cleaning the data involves a variety of filtering approaches.The above issuess must be identified and resolved since they might have a detrimental impact on the quality of the final product.The text cleaning in this project may be done using Python code that removes numbers, stickers, old style retweets 'RT', hashtags, punctuation and stop words.

Data analysis
The data has now been cleansed and prepped and is ready to be analysed.This process entails choosing analytical methodologies, creating models, and analysing the results.The goal of this stage is to create a machine learning model that will study the data using a variety of analytical approaches and then evaluate the results.This stage involves categorising the term as positive, negative, or neutral.To analyse the data in this study in context of Malaysian views on the COVID-19 vaccine, polarity and subjectivity have been estimated.It begins with determining the issue type, after which machine learning techniques such as classification, regression, cluster analysis, association, and others are chosen.The model is then built using the data that has been prepared, and the model is subsequently evaluated.As a result, during this stage, it will take the data and develop the model using machine learning methods.The Naïve Bayes approach was utilised to create the sentiment classifier used for emotion identification of Malaysian perspectives on COVID-19 vaccination.

Model training
The following stage is to train the model and, in this phase, the model must be trained to increase its performance in order to achieve a better solution to the problem.It employs a variety of machine learning methods to train the model utilising datasets.A model must be trained for it to comprehend the numerous patterns, rules, and characteristics.A dataset is utilised in this project to train the model using the Naïve Bayes technique in the scikit-learn Python module.

Model testing
The machine learning model may be tested once it has been trained on a specific dataset.The assessment of the correctness of the model during this stage is done by feeding it a test dataset.The percentage of correctness for the model is determined by testing it against the project or problem's requirements.The project must go through the testing model phase in order to assess the accuracy of sentiment analysis.Section 4 provides detailed calculations and explanations on how to determine the accuracy score, precision, recall, and F1-score.However, testing is usually done to see if the suggested design fits the initial set of business Testing may be done again to look for mistakes, defects, and interoperability.Verification and validation are two more aspects of this phase that will assist to assure the program's success.

Implementation
The focus of this study was to conduct a comprehensive analysis of sentiment analysis techniques applied to Twitter data, aiming to provide insights into their effectiveness and performance.Given the scope and depth of the analysis conducted, the decision was made to defer the implementation stage to future research, allowing for a more thorough investigation of real-world deployment challenges and considerations.

RESULTS AND ANALYSIS
This section will demonstrate the outcomes and interpretation of the research, structured in: i) interface, ii) process of gathering tweets, iii) findings of sentiment analysis, and iv) evaluation of the sentiment analysis classifier model.Further, the interface design will illustrate how users can interact with the sentiment analysis system and interpret the results effectively, enhancing usability and accessibility.The detailed exposition of the tweet gathering process will provide a clear understanding of data collection methodologies, addressing potential biases and limitations in the dataset.Additionally, the presentation of sentiment analysis findings will delve into nuanced insights derived from the analysis, shedding light on patterns, trends, and potential applications.Lastly, the evaluation of the sentiment analysis classifier model will encompass rigorous quantitative metrics and qualitative assessments, establishing a comprehensive assessment of its performance and generalizability.

Interface of the sentiment analysis application
The web dashboard as shown in Figure 1 incorporates dynamic visualizations that allow users to interact with the sentiment analysis results in real-time, enabling the exploration of sentiment trends across different time periods, user demographics, and tweet characteristics.By utilizing Tableau's robust features, the dashboard provides an intuitive and comprehensive representation of the sentiment analysis outcomes, enhancing the accessibility of insights and aiding decision-making processes for various stakeholders.

Tweets gathering process
This section describes the process of collecting tweets for the dataset.The data was obtained from Twitter using five specific hashtags, which were: i) '#COVID 19Vaccination', ii) '#AstraZeneca', iii) '#Pfizer', iv) '#Sinovac', and v) '#VaccineSideEffects'.The process started by utilizing the StartBot open-source program to extract tweets from Twitter.Afterwards, the dataset underwent data preparation and cleaning, as outlined in the methodology section.During the data cleaning step, any unimportant punctuation, stop words, and sentences were removed from the tweet's column.Figure 2, illustrates the sample code employed in the data.

Sentiment analysis results
The analysis of the cleaned dataset revealed that most of the COVID-19 data was neutral in tone, while only a small proportion had a negative sentiment refer Figure 3.This suggests that most people have a positive attitude towards the vaccination program.This finding can be reinforced by the daily updates on the Malaysian COVID-19 website, which provides information about the progress of the vaccination program.
The study presents a word cloud to showcase the public's opinions about COVID-19.The word cloud, displayed in Figure 4, focuses specifically on the topic of COVID-19 vaccinations.The results show that the Pfizer vaccine is the most talked about and tweeted vaccine on Twitter, as it is highlighted in a larger font size and in bolded letters.

Evaluation of the sentiment analysis model (classifier)
To assess the performance of the sentiment analysis classifier, four evaluation metrics are used: precision, recall, F1-score, and accuracy.These metrics are commonly used in the evaluation of classification models.The results of the evaluation are depicted in Figures 5 and 6.Based on Figure 5, the accuracy of this study stands at 97.3%.This can be considered a decent level of accuracy, as any value above 70% is considered a good model in evaluating sentiment analysis performance [25].
The classification report in this study, depicted in Figure 6, shows the: i) precision, ii) recall, and iii) F1-score of the study's results.The precision score of 93% indicates that the model is effective in identifying genuine positive outcomes among all correctly predicted positive results.The recall score of 94% shows that the model is capable of accurately predicting every instance in the training dataset.The F1-score of 94% is a high score, which confirms that this model is a good and reliable model for use in sentiment analysis.

CONCLUSION
The research aimed to provide a comprehensive understanding of Malaysians' perceptions of the COVID-19 vaccination through sentiment analysis.To achieve this, data was collected from Twitter and analysed to determine the sentiment behind the tweets.The results of the study showed that Malaysians have a generally neutral perception of the COVID-19 vaccination.This study highlights the importance of understanding public perception and sentiment towards a critical issue like the COVID-19 vaccination program.The findings can be used to inform and guide healthcare professionals, policymakers, and the public in making informed decisions regarding the COVID-19 vaccine.This study can also be used as a foundation for future research in the field of sentiment analysis, with the potential for improvement and expansion.Further, the findings of this study suggest that the general perception of COVID-19 among Malaysians is neutral.The increasing number of vaccinations being administered is evidence of this.While some individuals remain skeptical of the COVID-19 vaccine, awareness about it is growing.The word cloud analysis shows the frequency with which the COVID-19 vaccine is being discussed on Twitter.The accuracy of this study is 94%, which demonstrates its effectiveness in achieving its overall aim.
Limitations: this study faced several limitations in its implementation.Although the automated method can recognize and analyze text in various contexts, it has difficulty understanding complex language features such as sarcasm, irony, negations, jokes, and exaggerations.This can lead to incorrect sentiment classification, as the system is not able to grasp the intended meaning behind the words.For example, the word "sad" may be classified as negative, but in the context of "I was not sad," it should be classified as positive.Similarly, an automated sentiment analysis tool may not be able to detect sarcasm, such as in the statement "I'm really loving the enormous pool at my hotel!" accompanied by a picture of a small pool.This highlights the challenges faced by sentiment analysis tools in accurately analyzing sentiment in complex language.
Future research: this study has the potential for further development and improvement.One potential enhancement is to implement real-time updates of the sentiment analysis results for tracking Malaysians' emotions.This would allow users to track citizens' emotions without having to manually extract new tweets.Additionally, the study could incorporate a sentiment classifier correction tool to address typos and misspelled words in the dataset and new tweets, which could lead to improved data quality and a higher overall accuracy of the classifier.The study could also incorporate user engagement elements such as photos, videos, additional buttons for navigation, and notifications to inform users when the data is ready for analysis

Figure 2 .
Figure 2. Sample code fragment used in the data cleaning process


ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 436-443 442 and evaluation.Moving forward, the study aims to add new algorithms to improve the accuracy of sentiment analysis and to continue making advancements in this field of study.