An evolutionary approach to comparative analysis of detecting Bangla abusive text

Tanvirul Islam, Nadim Ahmed, Subhenur Latif

Abstract


The use of Bangla abusive texts has been accelerated with the progressive use of social media. Through this platform, one can spread the hatred or negativity in a viral form. Plenty of research has been done on detecting abusive text in the English language. Bangla abusive text detection has not been done to a great extent. In this experimental study, we have applied three distinct approaches to a comprehensive dataset to obtain a better outcome. In the first study, a large dataset collected from Facebook and YouTube has been utilized to detect abusive texts. After extensive pre-processing and feature extraction, a set of consciously selected supervised machine learning classifiers i.e. multinomial Naïve Bayes (MNB), multi layer perceptron (MLP), support vector machine (SVM), decision tree, random forrest, stochastic gradient descent (SGD), ridge, perceptron and k-nearest neighbors (k-NN) has been applied to determine the best result. The second experiment is conducted by constructing a balanced dataset by random under sampling the majority class and finally, a Bengali stemmer is employed on the dataset and then the final experiment is conducted. In all three experiments, SVM with the full dataset obtained the highest accuracy of 88%.

Keywords


Abusive text detection; Bangla text; Social media; Supervised learning; TF-IDF

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v10i4.3107

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats