Data mining applied about polygamy using sentiment analysis on Twitters in Indonesian perception

ABSTRACT


INTRODUCTION
Polygamy or polygyny is a marital relationship involving multiple wives [1]. Polygamy has become the subject of heated debates in media social such as Twitter and Facebook. The majority of societies refuse with negative comment, but only a few people are willing to respond this phenomenon positively. They who agree with polygamy may think that the polygamy is a legal thing in Indonesia [2]. Even, proponents claim that this thing is able to quell men's innately high sex, and also is alternative in order that men having a wife do not do the sex outside of marriage which is forbidden for Muslim [3]. Indeed, the percentage of men having multiple wives is overall minuscule in around the world, but people permit polygamy is about eightythree percent [4]. The holy Qur'an which is as a life guideline owned by Muslims of course allows men to marry multiple women with some conditions and also the men must be fair to their wives [5]. The polygamy that has happened is tough to be verified due to many unregistered marriages throughout Indonesia [6]. In the 1970 s, the rate of polygamous marriage in Indonesia was about 2% [7]. If there were no growth until right now, we would probably estimate about 4,800,000 polygamy. Furthermore, some researchers have showed about negative impact of the polygamy. It can be seen in Africa that the majority of people against polygamy because of economy challenge and less communication between children and new mothers [8]. The polygamous marriages would affect to each family member since the children of polygamous families might unrespect to their parents and also it is tough for husbands to be impartial, either his children or wives [9]. A study, women's mental health, has said that many polygamous families are stressful [10] and able to cause psychological troubles [11]. Despite of that, both men and women keep on involvement in polygamous marriages. Indonesia as the largest Muslim population in the world has regulated the 1974 marriage law about polygamy. Some parties, especially the secular women organization oppose the policy of polygamous marriages. Nevertheless, according to the marriage law, there is still the court which is used to minimize the risk of divorcement and polygamy [12]. The public perspective view extracted from Twitter database will be utilized in this study to determine what this case is positive, negative or neutral opinion in society. This analysis uses "Analysis Sentiment" that utilizes data analysis to extract data used by people in many social medias mainly Twitter [13]. Twitter is platform where many people around in the world express their thoughts, opinions about something and even their daily activities in the form of sentence or phrase. In the world, there are more 300 million users with 500 million tweets typed every day [14]. However, analyzing unstructured data like tweets is a tough business and also extracting beneficial information from twitter is an enormous challenge for scientists [15]. Therefore, a great technology is needed to treat a million of tweets, to extract the data and to know people's sentiments about polygamy. Even though there are various applications that can be used, researchers chose R language or application to perform this case [16]. Sentimental analysis is a method to describe whether a text written by user is in positive, negative or neutral. It means that basically this method is used to study emotion that is related to word and writing [17]. Sentiment analysis can be implemented using machine learning methods. Data in the form of text, for example, tweets that are entered or inputted will be separated first. This process is also known as tokenization. Tokenization is done to simplify the analysis process of a text sentence. After that, the sentiment of the input can be determined by classifying the previously separated words with the lexicon sentiment, thereby bringing out the polarity and subjectivity of the existing tweet [18].

RESEARCH METHOD 2.1. Connecting R programing to Twitter
R is a programming language and free software environment for statistical computing and graphics supported by the R foundation for statistical computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis [19]. It is able to download, plot, analyze the data taken on Twitter, while relation between R, and Twitter applies protocol, open authorization (Oauth) [20]. OAuth is an open-standard authorization protocol or framework that describes how unrelated servers and services can safely allow authenticated access to their assets without actually sharing the initial, related, single logon credential. In authentication parlance, this is known as secure, third-party, user-agent, delegated authorization. OAuth scenario could be a user sending cloud-stored files to another user via email, when the cloud storage and email systems are otherwise unrelated other than supporting the OAuth framework (e.g., Google Gmail and Microsoft OneDrive) [21]. One of the more useful downloadable packages is twitter (http://cran.rproject.org/web/packages/twitteR/index.html).

Collection Twitter data
Firstly, the data on Twitter is collected and processed by R's tool namely statement analysis. Twitter account once registered and logged in, needs registering the application name on Twitter application programming interface (API) to create an application which provide the four legal credentials (API_key, API_secret, access_token, access_token_secret), [22], [23]. Then, these keys and tokens are used to extract the data of Twitter to R [24]. There are some steps in registered application on Twitter API as showed in. After that, run the following code (API_key, API_secret, access_token, access_token_secret) in R program to set authorization used to extract Twitter data [25]. In general, there are 6 steps in collecting and managing data in Twitter including: a. Log on to Twitter Developers site and Sign in with your Twitter account b. Go to apps.twitter.com c. Generate a new application d. Enter the details of your application e. Create your access token f.
Make a note of OAuth settings Figure 1 shows a pictorial view of the steps involved to registered in application on Twitter API.  Figure 1. Pictorial view to register in application on Twitter API

Sentiment analysis
This paper, therefore, will analyze people' opinions about polygamy using sentiment analysis. The sentiment analysis is the automated process to identify positive, negative and neutral opinions from text [26]. Sentiment analysis is widely used for getting insights from social media comments, survey responses, and product reviews, and making data-driven decisions [27], [28]. This is the most unique function implemented in the paper since it describes the on-going thoughts of variety of people. The step of sentiment analysis's processes is pointed out as shown in: -Extract tweets using OAuth protocol are to collect the data from the tweets on any topic using polygamy words. - Cleaning of text using R Language is cleaned by removing unwanted expressions and words. -Data modeling and transformation are a step after retrieving and cleaning the data transformed and prepared in a structured format to retrieve sentiments. - Retrieving sentiments are analysis of sentiments performed. -Graphical representation is the last step, where the sentiments are plotted and visualized by graphs and word cloud.

RESULTS AND DISCUSSION
Some tweets focusing on polygamy issues in Indonesia language detailly as shown in. On Twitter there are many unwanted information in order that it needs to be cleaned that and to be regulated for useful informations only. Furthermore, the tweets on Table 1  A tag cloud (word cloud) is a novelty visual representation of tweets and used to visualize the essential information from the tweets. On the Figure 2 the most frequently-used words are polygamy, islam, nikah, halalkan, jilbab, dihujat, and dilarang. Every color and size of words describes the frequency of words immediately responded by people. Figure 3 gives the visualization of words used in the tweets. It can be seen that the words such as poligami has the highest frequency followed by word suami and Istri. The major aim of this paper is to analyze the sentiments of people about polygamy and also the sentiment analysis has given us the clear illustration of public's positive, negative or neutral sentiments. Furthermore, we can see that as of the Table 2 and Figure 4, neutral sentiment or perception is more dominant than positive or negative perception. It means that the polygamy, of recent, becomes ordinary thing in Indonesia's society.

CONCLUSION
Polygamy is not a common thing in Indonesia's cultures and even many people support that case, despite some people also refuse that. Moreover, this paper has studied and analyzed sentiments in public about polygamy and it uses three sentiment scores i.e. positive, negative, and neutral. In conclusion, this analysis has shown that present polygamy is a normal thing in Indonesia. As has been said in above, the sentiment analysis score of neutral is more dominant than the sentiment analysis score of positive or negative.