The search for science and technology verses in Qur’an and hadith

Received Sep 18, 2020 Revised Nov 16, 2020 Accepted Dec 5, 2020 Currently, the vector space model algorithm has been widely implemented for the document search feature because of its reliability in retrieving information. One of them in the search for verses of the Qur'an based on the translation. However, if the phrase or word used is different (even though it has one meaning) with the word in the document in the database, the system will not display the verse. As we know that the Qur'an has a very deep meaning, so an interpretation of the verse is needed. Therefore, this research focuses on implementing the vector space model (VSM) algorithm for searching verses and hadiths in science and technology by using the discussion parameters of these verses or hadiths. The test results obtained with 20 keyword samples using metric recall were 81% with an average time of 2.24 seconds.


INTRODUCTION
The Qur'an is the essence of all science. However, the knowledge contained in the Qur'an is still in the form of seeds and principles. The Qur'an contains the principles of all science, including technology and knowledge of the universe. The Qur'an contains various levels of definition and layered meaning for all readers [1]. Sunnah or hadith according to El-Naggar [2] is everything that comes from the Prophet Muhammad, in the form of words, behavior, determination of the Prophet, nature, sirah (biography of the Prophet SAW), both before and after he was sent as a prophet and apostle. The sunnah of the Prophet is the second life guide after the Qur'an.
Information technology facilitates all human needs in almost all aspects. Smartphones, Search engines and television are real examples of the application of information technology [3]. Search engines like Google, Bing, Yahoo and Ask in their search process adopted a system called the information retrieval system [4]. This system accepts keywords or input from the user to the document based on its suitability to the keywords [5]. The vector space model (VSM) algorithm is an information retrieval model that presents documents and keywords into vectors in multidimensional spaces. The similarity of both can be measured by calculating the angles formed by document vectors and queries [4]. With this algorithm, we can use words, phrases, or sentences for the keyword [6]. The structure of phrases and sentences does not have to be exactly the same as the document in the database. VSM adopts similarity measure for matching between documents and user  Taufik) 1009 query, and assign scores from the biggest to smallest. The documents and query are assigned with weights using term frequency and inverse document frequency method [7]. There are many variants TF-IDF model to  get the weights including term frequency, classical TF-IDF, normalized TF-IDF and sub-linear normalized  TF-IDF [8].
Currently, it has many applications hadith or the Qur'an which include search features. However, the technique is still limited to the search for a match search word (string matching) so that the results often do not appear when a query is given in the form of a sentence or phrase [8]. In addition, until now there has been no application or system that specifically provides information on the verses of the Qur'an and hadith relating to science and technology [9]. Whereas in the interpretation of Shaykh Thantawi Al-Jawahir, it is said that in the Holy Qur'an there are more than 750 verses of kauniyah (verses about the universe [10].
Many search techniques are currently used by researchers. One of the most popular search techniques is word matching or string matching. The concept of searching for this technique is to match a certain word pattern to a long sentence or text [11]. However, if the wording is not the same as in the database, the search will not be found. Another alternative technique for finding information that is also currently popular is the vector space model algorithm that adopts word weighting. With this technique, the wording does not have to be exact. Even if the user enters the words upside down or in the wrong order in the database, the system will still return the information. In previous studies related to the implementation of the VSM algorithm for searching Al-Qur'an verses [12], the parameters used were through the translation of Al-Qur'an verses and expansion queries to improve keywords to be based on data in the database. However, if the user enters a different query but has the same meaning and is not in the database, the system will not be able to return the information it is looking for. Therefore this research in its implementation tries to use the discussion that comes from the books of commentaries of the Qur'an and the Hadith whose language structure is closer to daily life as its parameters.

RESEARCH METHOD
In this paper, research is conducted with the steps that can be seen in Figure 1. The first step taken is to enter keywords (can be in the form of words, phrases or sentences), then do the preprocessing to get a list of word tokens which then do the weight calculation tf-idf after it obtained a list of queries and documents weights used in the computation algorithm VSM [13][14][15][16] if the cosine value>0 then shown a list of relevant data otherwise found no data is displayed. This research uses a software development method called the prototype model [17][18][19]. The application to be built is a website based application that is built using the Django framework and Python. To be able to access it, users need a browser [20]. The application architecture developed is illustrated in Figure 2. First, the client will send a URL request through the browser. This URL contains the application pathname. If it is true, then the HTML page of the templates and data from the Model in the database will be sent to the webserver and displayed.

. Collecting text data of Qur'an and hadith
The verses data used in this research is in the form of Indonesian translation Qur'an and hadith text data that obtained from the Mushaf Al-'Alim edition of science [1], Book of scientific miracles on earth and space [5] and three series of books proving science in the sunnah El-Naggar [2]. In addition, until now there has been no application or system that specifically provides information on the verses of the Qur'an and hadiths relating to science and technology [9].

Preprocessing
As has been described in the flowchart that this algorithm uses the weight of the query and document to find the resemblance [21]. Before entering the algorithm calculation, the document and query must go through the preprocessing (text processing) to get the weight value. This stage is divided into 4 parts, namely case folding, filtering, stemming, and tokenizing [22]. There are 4 known documents and Q (input query): D1 =Sesungguhnya manusia berada dalam kerugian D2 =Demi langit yang mempunyai gugusan bintang D3 =Aku tidak akan menyembah apa yang kamu sembah D4 =Sesungguhnya mereka adalah orang-orang yang merugi Q =Manusia yang rugi -Case folding The first stage in preprocessing is leveling letters (case folding) into a lowercase [23]. D1=Sesungguhnya manusia berada dalam kerugian -Filtering This stage removes the stopword (meaningless word) and punctuation in each document such as the word 'adalah, yang, apa, dan sebagainya' [24]. D1=Sesungguhnya manusia kerugian -Stemming The stemming stage is the process of converting words into their basic form, by removing affixes to the word [25]. D1=Sungguh manusia rugi -Tokenizing The last stage is tokenizing which functions to separate each word into tokens [26] as: sungguh, manusia, rugi, langit, gugusan, bintang, sembah, orang.

Word weighting (TF-IDF)
In Table 1 the term column is words that had previously gone through the preprocessing stage. Next, we calculate the frequency of terms that appear in each document. If the document contains the term sought, then the value is one, if not, then the value is zero. Column Q is the keyword sample that you want to find, that is, human loss. After that, we calculate the frequency of occurrence of each term in the whole document (DF) [27]. Based on calculations on the theoretical basis, to determine how important a term is to an entire document, we must calculate the value of the IDF (inverse document frequency) of each term with (1) [28]. After that, we will get the term weight by multiplying the term frequency value and the IDF [29].

VSM algorithm process
The first step in calculating this algorithm, we calculate the size of the query vector and the document vector using (2) and (3) on the theoretical basis of the algorithm by first squaring the weight value (Table 2 columns Wq 2 through WD4 2 ). After that, the square value of each document and query is added up ( Table 2 rows in blue) and then the square root values (Table 2 rows in orange) are searched. The second step multiplies each term weight in each document by the query weight (Table 2 columns WQ*WD1 to WQ*WD4) then add up the values for each document ( Table 2 red rows). To get the document similarity values according to (4) for the values that we have added in the red line with the results of the square root (in Table  2 orange lines). The final ranking of the document is indicated by a gray bar. We can see that the documents that are relevant to the query are documents one and four (the value is not equal to zero) and the document that has the highest level of similarity to the query is document one or D1.

Testing
This test is intended to test the performance of the vector space model algorithm using 20 sample keywords through the recall method [4]. The test results can be seen in Table 3. Testing is done by calculating the percentage of recall [30] to get the algorithm's ability to reinvent information [4]. Based on the information on each test data, the average recall value of the system is: = the number of relevant documents that are called The number of relevant documents in the database The system recall results showed an algorithm testing value of 81% for the level of success in finding back information with an average time of 2.24 seconds. The failure of the system to retrieve some information can be caused by the ability of the local server and the amount of data that is processed [31].

CONCLUSION
In this paper, the vector space model algorithm is applied to the system through four stages, namely, preprocessing, calculation of document weights and queries, calculation of cosine angles of document and query vectors, and ranking of documents. Based on the test results using 20 sample data, it can be concluded that the Vector space model algorithm provides good performance in rediscovering verses in the Qur'an or Hadiths relating to Science and Technology by 81% with an average time of 2.24 seconds. Vector space