Dissecting of the two-stages object detection models architecture and performance

Sara Bouraya, Abdessamad Belangour


Artificial intelligence (AI) is the discipline focused on enabling computers to operate autonomously without explicit programming. Within AI, computer vision is an emerging field tasked with endowing machines with the ability to interpret visual data from images and videos. Over recent decades, computer vision has found applications in diverse fields such as autonomous vehicles, information retrieval, surveillance, and understanding human behavior. Object detection, a key aspect of computer vision, employs deep neural networks to continually advance detection accuracy and speed. Its goal is to precisely identify objects within images or videos and assign them to specific classes. Object detection models typically consist of three components: a backbone network for feature extraction, a neck model for feature aggregation, and a head for prediction. The focus of this study lies on two stage detectors. This study aims to provide a comprehensive review of two stage detectors in object detection, followed by benchmarking to offer insights for researchers and scientists. By analyzing and understanding the efficacy of these models, this research seeks to guide future developments in the field of object detection within computer vision.


Computer vision; Convolutional neural network; Deep learning; Deep neural networks; Neck models; Object detection; Two stage detectors

Full Text:


DOI: https://doi.org/10.11591/eei.v13i3.6424


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).