Triangular fuzzy number for similarity measurement of Y chromosome DNA profile

ABSTRACT


INTRODUCTION
The high number of recent terrorist bombings in Indonesia, natural disasters that result in many casualties, and the increasing number of killings with the modus operandi of mutilation have made it difficult or impossible for the victim to be recognized because of the damage to some or all their limbs [1].The victims can be identified by examining their DNA [2]- [4].Indonesia has carried out the process of identifying disaster victims and victims of crime by matching the victim's DNA profile with the alleged biological family.Identification of DNA profiles is conducted on biological evidence from the human body using polymerase chain reaction (PCR) technology and short tandem repeat (STR) sequences [4]- [6].The DNA profile is a unique genetic fingerprint that distinguishes one individual from another because it is an inherited molecule.
Previously, many studies have been conducted related to matching human DNA profiles.Some of them matched the DNA profile with the alleged victim's biological family used fuzzy inference system [6]- [10]; and DNA profile matching involving tribal information [8]; DNA matching with the victim's family using the Gaussian fuzzy number method [11].From the previous method, the measurment of DNA similarity used 16 loci of DNA profile.While this study use Y-chromosome (YSTR) to measure the similarity of DNA profile involving family relationship.
YSTR is the male STR locus [12], [13].Thus, the measurement of DNA profile similarity using YSTR should be conducted on male queries and male references.Y-STRs are Y chromosomes found in STRs used to identify male lineages.YSTR shows a high degree of variability which indicates the existence of a kinship relationship of male STR DNA in a population [14].The advantage of this locus is the data consists of male and female reference DNA profiles with unclear traces or origins like in cases of sexual violence where the STR DNA profile data obtained is the DNA profile of a woman and a man.
One example of a crime that requires a YSTR identity is in a case of sexual violence.The evidence of DNA in the form of sperm, a DNA profile will be obtained which shows a difference in STR values on Y chromosome.From the YSTR value found [15], it will be known how many men were involved in this cases and the YSTR will make it easier to find out the identity of the perpetrators.
In the identification of human DNA profiles, the STR value indicated by the DNA locus is an integer.However, due to several factors such as contaminated DNA sources, due to weathering and other causes, the STR value can shift to a decimal value.If the measurement of DNA profile similarity is done crisply, the decimal value indicated by the locus must be converted to an integer.To change the STR value, re-sampling and re-extraction of DNA sources must be conducted, which is costly and time-consuming.To accommodate the shift in the STR value at the DNA locus, the fuzzy method was used to measure the similarity of the STR value at the DNA profile locus.

MATERIAL AND METHOD 2.1. STR DNA profile
The DNA profile is an individual DNA structure that describes their biological identity.DNA profiles of somebody consist of 16 loci, each of which maps STR from the specifications of each locus.A person's DNA profile is identified by examining their biological evidence, also known as DNA evidence, which can be obtained from several parts of the body, such as blood, saliva, bones, muscles, sperm, teeth, hair, or body fluids such as urine and sweat [2], [16], [17].
Human DNA profiles can be identified through the examination of STR markers.STR is a repeating pattern of nucleotides that usually consists of 2-6 bases with the same pattern without any other sequence or intervention of a different sequence.Currently, the PowerPlex Fusion 6C System has been developed which contains 27 STR loci, namely CSF1PO, FGA, TH01, vWA, D1S1656, D2S1338, D2S441, D3S1358, D5S818, D7S820, D8S1179, D10S1248, D12S391, D13S317, D1816S5392119, D13S5392119 and known as amelogenin and DYS391 as sex determinants, penta D, penta E, D22S1045, TPOX, SE33 two repeating mutations of Y-STRs, i.e., locus DYS570 and DYS576.Each locus consists of a pair of alleles, each of which is inherited from the biological father and mother.

YSTR
YSTR is a Y chromosome found in the human DNA profile in male lineages.YSTR is the STR locus that is only owned by men.Thus, the measurement of DNA profile similarity using YSTR should be conducted on male queries and male references.YSTR analysis is commonly used for forensic purposes, paternity and genealogical DNA testing.YSTR is acquired from the paternal lineage.Therefore, a boy will have the same YSTR value as his biological father and his male sibling.
The following diagram in Figure 1 illustrates the linking of the Y chromosome (YSTR) in males, where squares represent males and circles represent females.YSTR is the male STR locus.Thus, the measurement of DNA profile similarity using YSTR should be conducted on male queries and male references.The Figure 2 shows that the measurement of DNA profile similarity using YSTR was conducted on male queries with reference to the DNA profile of the male victim's biological family.In this case, the reference used was the victim's grandfather's STR DNA profile from the father's side.This following diagram in Figure 3 describes the measurement of YSTR DNA with brother of victim's father as reference.

521
The references selected to compare DNA profile data if the similarity is measured based on YSTR are all males who have a biological relationship with the victim from the father's lineage.Candidates for reference include the father, brother, father's brother, grandfather from the father's side, male cousin, and nephew.

Membership functions of input and output variables
In fuzzy system there are membership functions for input and output.Input variables have three membership functions, namely small, medium, and large.Meanwhile the output variable consist of 2 membership functions, which are suitable and unsuitable.

Measurement of DNA profile STR similarity
In contrast to previous studies, measurement of allele similarity at each DNA profile locus will be carried out by considering references, one of which is a substitute for a biological father.Which this will affect the  [18]- [20].Therefore a reference from biological mother is needed as a source of half of a pair of alleles at each DNA profile locus.
Measurement of the similarity of STR values at the DNA profile locus was conducted for each allele at each DNA locus [21].The similarity measurement of the STR value was conducted using the fuzzy similarity measure [22], [23].Fuzzy similarity is used to accommodate the shift in the STR value at the DNA locus [24], [25].Shifts in STR values can occur due to several factors that cause the extraction results from DNA sources to be no longer pure.This causes the STR value, which should be an integer, to be a real number.
The STR value of the DNA profile is described as an isosceles triangle whose middle value is the STR value of the DNA profile.The height of the triangle set to 1 while the distance between the two legs of the triangle is set to 0.4.Figure 4 describes two isosceles which have a difference in STR value of 0.1.The measurement of similarity of two STR values that are suspected to have shifted the STR value, fuzzy similarity measure was used.The similarity value of the two STR values was obtained from the intersection of the two triangles use (1). Figure 5 shows the intersection of two triangle as the similarity value.The shift in the value of STR tolerated in this study was 0.2.The results of the similarity measurement using the fuzzy similarity measure gave a value in the range of 0 to 1.
Where  is te intersection point of two allele; 2 is STR value of first allele; 3 is a2 + 0.2; 1 is STR value of second allele.So 0 ≤  ≤ 1.

523
The similarity measurement of the two DNA profiles was performed by measuring the similarity of the STR value of allele 1 to the STR value of alleles at the same locus from the first reference, and the second allele was compared or its similarity was measured with alleles at the same locus from the second reference.For the Y-STR locus, the similarity measurement of the STR value of a certain locus was compared against both alleles of the same locus from a male reference.Accordingly, for the YSTR locus, the locus similarity value is obtained from equation similarity in (2): Where  1 is similarity value of allele 1 and  2 is similarity value of allele 2.

Data
The data used in this study are the DNA profiles of 27 loci of Javanese which consist of 100 samples.All data are obtained from the Faculty of Dentistry University of Indonesia.Table 1 is an example of DNA profile consist of victim data (query), biological mother data, and DNA profile data from biological relatives of the victim's father.

RESULTS AND DISCUSSION
During the identification, the similarity of two DNA profiles must be measured by the similarity of DNA profile markers (STR).Measurement of the similarity of STR DNA profiles using Y chromosomes can be used for paternity tests, where DNA profile data for suspected biological fathers are not found.Thus, we need DNA profile data from the biological brother of the father.Measurement of DNA profile similarity was conducted on 27 DNA loci.
Similarity measurement of DNA profile using fuzzy similarity will give the similarity value of each allele in the loci in question.The similarity value of the DNA profile is the average of all allele similarity value.By using fuzzy similarity, the similarity value of two allele that compared are in the range of 0 and 1. Similarity measurement of DNA profile using fuzzy similarity will give the similarity value of each allele in the loci in question.The similarity value of the DNA profile is the average of all allele similarity value.Table 2 shows the comparison of the similarity measurements of DNA STR values using the crisp method and the fuzzy method.From Table 2 we can compare the value of similarity measurement using the crisp method to give a value of 0 when STR value of query is 15 and reference STR value is 15.2.This is caused by 15 ≠ 15.2.Whereas in measurements using the fuzzy method, the similarity of the STR value of 15 with the STR value of 15.2 will give a similarity of 0.5.Which means 15 is similar or almost the same as 15.2.
The similarity measurement is carried out from the allele from the mother first, and the other alleles will be measured for similarity with the father's surrogate reference.The following Table 3 is a comparison table for measuring similarity values using the crisp method and the fuzzy method for all DNA profile loci.Table 3 illustrates a comparison of similarity values for the entire query DNA profile with the mother's DNA profile.The similarity value with the crisp method gives a value of 0 or 1 for each locus allele being compared.While the fuzzy method gives similarity values in the range 0 and 1. Similarity value of each allele will be accumulatated and the average will show the similarity value of all locus alleles.So that the similarity value of the fuzzy method is greater than the similarity value using the crisp method.The same case can be seen in Table 4. Tables 4 show the value of the similarity measurement between query's STR DNA profile to the gandfather's DNA prifile using crisp and fuzzy method.Similarity value of fuzzy method is greater than similarity value of crisp method in the same locus's alleles.This also applies to the comparison of crisp and fuzzy method for measurement the similarity between DNA profile of query and uncle.
Table 3 to Table 5 show that the similarity values of the two DNA profiles are compared using the crisp method and the fuzzy similarity measure method.The value of each allele at each locus compared with the crisp method is 0 or 1 while using the fuzzy similarity measure, the allele similarity value is in the range of 0-1.This will have an impact on the overall DNA profile similarity value where the value of the DNA profile similarity with the fuzzy similarity measure will be greater than that with the crisp method.Measurement of the similarity of the DNA profile between the query with the mother and the biological family of the father from the male side was conducted by first measuring the similarity of one allele from the same locus to the mother.The other alleles at the locus were measured for similarity to the alleles from the surrogate biological father reference.As in the D21S11 locus, the first allele was compared with the allele at the same locus with the maternal DNA profile, namely the STR value of 30.The second allele with the STR value of 31.2 was compared to the allele at the D21S11 locus of the grandfather's DNA profile with a similarity value of 0. The Table 6 describes the comparison similarity value of YSTR between query's YSTR value to uncle's YSTR value and grand father's YSTR value.
By compare two DNA profile using fuzzy similarity, we obtained the similarity value using fuzzy is higher than using crisp method.Figure 6 illustrates the difference in DNA similarity values using the crisp and fuzzy methods.Figure 6 shows the similarity value with the fuzzy method is greater than the similarity value using the crisp method.Figure 7 describes the comparison of DNA YSTR similarity value between uncle and grandfather using crisp and fuzzy methods.To conclude the similarity of the query DNA profile with the biological family of the alleged father, the overall DNA profile similarity value must be greater than or equal to 0.5, and the similarity value for the 3 allele pairs of the Y chromosome (YSTR) is greater or equal to 0.75.

CONCLUSION
Measurement of the similarity of human DNA profiles using the Y chromosome (YSTR) can be conducted to determine the kinship between the query DNA profile and the alleged biological family of the male.YSTR is passed down by males to their offspring without changing.However, the STR value of the DNA locus allele can experience a shift due to several factors.Measurement of DNA profile similarity using fuzzy similarity measure to STR values at DNA profile loci gives better results than using crisp or manual methods.With the fuzzy similarity measure, the shift of the STR value can be accommodated so that it can provide a similarity value in the range of values 0-1.To conclude whether the query has a kinship with a family reference, the DNA profile similarity value for the Y chromosome must be equal to or greater than 0.75.

ISSN: 2302- 9285 
Triangular fuzzy number for similarity measurement of Y chromosome DNA profile (Meira ParmaDewi)

Figure 6 .Figure 7 .
Figure 6.Comparison chart of DNA similarity value using fuzzy and crisp method

Table 1 .
Data of DNA profile

Table 2 .
Comparison similarity value of two allele

Table 3 .
Comparison of STR DNA profile similarity values using the crisp method and the fuzzy similarity measure method

Table 4 .
Comparison of STR DNA profile similarity values using the crisp method and the fuzzy similarity measure method with the reference of grandfather from the father's side

Table 5 .
Comparison of STR DNA profile similarity values using the crisp method and the fuzzy similarity measure with uncle from the father's side as reference

Table 6 .
Comparison similarity value of YSTR