Bulletin of Electrical Engineering and Informatics

Ayodeji Olalekan Salau, Tsehay Admassu Assegie, Adedeji Tomide Akindadelo, Joy Nnenna Eneh Department of Electrical/Electronics and Computer Engineering, Afe Babalola University, Ado-Ekiti, Nigeria Department of Computer Science, College of Natural and Computational Science, Injibara University, Injibara, Ethiopia Department of Basic Sciences, Babcock University, Ilishan Remo, Nigeria Department of Electronic Engineering, University of Nigeria, Nsukka, Nigeria Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, India


INTRODUCTION
Concrete is a widely used material for construction in the world [1]. Different components such as cement, age, coarse and fine aggregate, and water should be combined to make the concrete [2], [3]. These components can affect the compressive strength of the concrete. To obtain the real compressive strength of concrete (target labels in the dataset), an engineer needs to break the cylinder samples under the compressiontesting machine [4], [5]. The failure load is divided by the cylinder's cross-section to obtain the compressive strength. Engineers use different kinds of concretes for different building purposes. For example, the strength of concrete used for residential buildings should not be lower than 2500 pounds per square inch (psi) (17.2 megapascals (MPa) [6]. Concrete is a material with high strength in compression, but low strength in tension. That is why engineers use reinforced concrete (usually with steel rebars) to build structures.
The compressive strength of concrete is one of the significant parameters in structural engineering [7]. Because today's construction work requires high-strength concrete for higher durability. Determining the strength of concrete involves time, planning, and financial resources because the commonly used compressive strength factor is obtained on the 28th day [8]. For this reason, concrete strength is estimated before the concrete is used for building construction. The concrete strength is estimated by conducting laboratory tests. But, laboratory analysis of concrete sample strength requires significant experimentation time and costs.
Machine learning has been widely used for concrete strength prediction [9]. Moreover, there is a clear opportunity for an automated model to reduce the wait time for estimating concrete strength using traditional laboratory tests. Thus, with the concrete dataset acquired, it is possible to develop a model that learns about the relations between variables and develop a predictive model. However, there is no previous work that has conducted extensive experiments on the better regression model that is recommended for concrete strength prediction. Thus, this research aims to evaluate the effectiveness of different regression models for concrete strength prediction. The objective of this work is to study the effectiveness of various supervised regression models on the prediction of concrete strength depending on its compositions. This study investigates the variability of different regression models in the prediction of concrete strength. (1) To develop different regression models for predicting the compressive strength of concrete, (2) To compare the models using visualizations and measures of accuracy, and (3) To analyze the importance of concrete contrastive features of each model. The rest of this work is organized as follows: in section 2, the state-of-the-art concrete strength prediction models are reviewed, and in section 3, the dataset and the regression models used in the simulation are presented. Section 4 presents the result found and finally section 5 concludes the findings of the work.

RELATED WORK
There have been significant efforts in the utilization of supervised machine learning algorithms to tackle construction problems. Supervised algorithms and prediction models have been widely employed in the estimation of concrete strength that is used in construction. For instance, in [10] backpropagation (BP) neural network (NN) is applied to the concrete compressive strength dataset to automate concrete strength analysis. Gill et al. [11] employed a bagging classifier and developed an automated concrete compressive strength prediction. The evaluation of the predictive accuracy of the developed model reveals that the artificial neural network outperforms compared decision tree and bagging classifier for concrete strength prediction.
The performance of K-nearest neighbor (KNN), random forest, and decision tree algorithms are compared for concrete strength prediction [10]. The comparative result for concrete strength prediction using the (KNN), random forest, and decision tree shows that random forest outperforms compared to KNN and decision tree model. The random forest model performed with an accuracy of 91.26% on concrete strength prediction. Similarly, in [12], long short-term memory is applied to the concrete strength prediction problem. The authors employed a support vector regression algorithm to develop a model for strength prediction. As shown in the result analysis, the support vector regression model achieved a root mean square error (RMSE) of 0.508 and an R-Squared error of 0.997. Moreover, the authors compare the conventional support vector regression model with developed long short-term memory and the result shows that long short-term memory outperformed compared to the support vecotor regressor.
Advanced machine learning techniques are applied to develop a model that predicts concrete strength [13]. The study compared different machine learning models such as decision tree, AdaBoost regressor, and bagging regressor. The result reveals that the bagging regressor or random forest regressor achieved higher accuracy as compared to the Adaboost and decision tree for concrete strength prediction. Another study [14] applied hybrid machine learning models to predict concrete strength. The researchers applied an artificial neural network (ANN) to develop the model that predicts concrete strength. The model is evaluated on a concrete test set and the result shows that the ANN model achieves 97% accuracy on concrete strength prediction. The performance of the support vector machine (SVM) and ANN is evaluated for concrete strength prediction [15]. The comparative result reveals that SVM outperforms as compared to the ANN model. Although different models have been developed for concrete strength prediction, the model in the literature has scope for improvement in terms of accuracy.

METHODOLOGY
The concrete strength prediction dataset is obtained from the University of California Irvine (UCI) machine learning data repository. The dataset consists of 1,030 samples of different concrete compositions and the concrete strength value. The UCI concrete compressive strength dataset is widely used to develop a machine learning model to predict concrete strength [16]- [18]. To develop the model, different regression algorithms are employed. The features of cement compressive are demonstrated in Table 1.

Performance measures
To evaluate the performance of different regression models employed to predict concrete compressive strength, the root means square error (RMSE) and R-squared are employed, and accuracy as a performance measure or metric. The root means the square error is determined by the following formula [19]- [22]. The RMSE is employed due to its wider applicability in regression model evaluation [23]- [25]. where: Σ is the sum of the difference between t h e predicted and observed values for the i th observation in the dataset, Oi is the observed value for the i th observation in the dataset and N is the sample size. The regression models are trained on 7 input values of concrete strength composition parameters such as water, cement, coarse and fine aggregate, and other features. The effect of each concrete composition parameter on the concrete strength is examined. A correlation matrix between input parameters is calculated by employing the Person correlation coefficient for each pair of concrete composition parameters. The correlation matrix used to investigate the effect of different concrete composition parameters is demonstrated in Figure 1. The correlation matrix shows the relationship between each concrete compression strength feature.
As shown in Figure 1, the concrete strength is highly affected by the cement parameter having a correlation value of 0.50. Thus, a strong positive correlation is found between composite strength and the amount of cement in the composition of the concrete. Moreover, there is a strong positive correlation between concrete composition namely, superplasticizer and water, and similarly a positive correlation between superplasticizer and fly ash. Moreover, the 3-dimensional plot of the three most important features, namely cement and superplasticizer against age and compressive strength is demonstrated in Figure 2. As demonstrated in Figure 2, cement and superplasticizer have a high impact on the concrete compressive strength.

RESULTS AND DISCUSSION
The variability among different regression models such as linear regression, decision tree regression, random forest regression, support vector regression, K-neighbors regression, gradient, and AdaBoost regression is analyzed on the concrete compressive strength dataset. The results are visualized using tables and graphs using RMSE and R squared error as performance evaluation measures. Table 2 demonstrates the performance of the regression model on concrete strength prediction. In Table 2, the performance of different regression models is demonstrated. As shown in Table 2, the gradient boosting regressor model has better performance as compared to other regression models. Moreover, the performance of the regression models on correct strength prediction is shown in Figure 3.

Performance of the regression models for concrete strength estimation
The regression models are compared using training and test accuracy for concrete strength estimation. The comparative results for each regression model on concrete strength estimation are demonstrated in Table  2. As demonstrated in Table 2, The Gradient boosting regression is the better model as it provides better complexity with a 95% confidence interval between 88% and 95%. The test score for each regression model on concrete strength estimation is demonstrated in Figure 4.
As shown in Table 2, the gradient boosting regressor model outperforms compared to other regression models for concrete strength estimation. The confidence interval for gradient boosting regressor is between 0.54 and 1.00. Moreover, the gradient boosting regressor model has an accuracy of 90.2% on test data with a confidence of 95%. Figure 4 demonstrates the accuracy of different regression models on concrete strength estimation.

CONCLUSION
This study evaluated the performance of regression models for concrete strength estimation. Experimental simulation shows that the gradient boosting regressor gives a good accuracy score as compared to other regression models for concrete strength estimation. The experiment has also proved that with validation better result is obtained with gradient boosting regressor. Thus, a gradient boosting regressor is better for modeling the strength of high-performance concrete. The regression models are significant to improve the concrete strength prediction and reduce the number of experimental tests required when checking concrete composition. Finally, the study reveals that regression models are capable of mapping input features to target concrete compressive strength.