Peer to peer lending risk analysis based on embedded technique and stacking ensemble learning

Muhammad Munsarif, Muhammad Sam’an, Safuan Safuan

Abstract


Peer to peer lending is famous for easy and fast loans from complicated traditional lending institutions. Therefore, big data and machine learning are needed for credit risk analysis, especially for potential defaulters. However, data imbalance and high computation have a terrible effect on machine learning prediction performance. This paper proposes a stacking ensemble learning with features selection based on embedded techniques (gradient boosted trees (GBDT), random forest (RF), adaptive boosting (AdaBoost), extra gradient boosting (XGBoost), light gradient boosting machine (LGBM), and decision tree (DT)) to predict the credit risk of individual borrowers on peer to peer (P2P) lending. The stacking ensemble model is created from a stack of meta-learners used in feature selection. The feature selection+ stacking model produces an average of 94.54% accuracy and 69.10 s execution time. RF meta-learner+Stacking ensemble is the best classification model, and the LGBM meta-learner+stacking ensemble is the fastest execution time. Based on experimental results, this paper showed that the credit risk prediction for P2P lending could be improved using the stacking ensemble model in addition to proper feature selection.

Keywords


Credit risk; Embedded technique; Feature selection; Peer to peer lending; Stacking ensemble model

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v11i6.3927

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats