Advanced Computer Vision Techniques Using Graph Neural Networks for Real-Time Object Detection and Scene Understanding
Keywords:
Computer Vision; Graph Neural Networks (GNN); Real-Time Object Detection; Scene Understanding; Graph Attention Networks (GAT); Scene Graph Generation; Deep Learning; Contextual Reasoning; YOLO-Based Detection; Spatial Relationship Modeling; Intelligent Vision Systems; Semantic Scene Analysis.Abstract
Real-time object recognition and scene perception are central to the higher level computer vision operationalities in autonomous driving, smart surveillance, robotics, and smart healthcare systems. Nevertheless, traditional convolution-based object detecting models are mainly concerned with single object detection and do not typically work well in obtaining contextual and spatial associations among objects in complex scenes. Such a restriction diminishes the accuracy of semantic understanding as well as the reliability of decision-making in changing real-world situations. To counter this difficulty, this paper presents a novel computer vision structure that is more advanced with the implementation of the Graph Neural Network (GNN) to detect objects and comprehend scenes in real-time. In the proposed model, the lightweight YOLO-based backbone feature extractor is paired with a Graph Attention Network (GAT) to predict inter-object relationships and context scene relationships. It has a scene graph generation mechanism to enhance semantic reasoning and spatial interaction analysis between detected objects. The framework was tested on benchmark datasets such as COCO and Visual Genome with real-time conditions. The experimental findings support that the proposed method obtained an average Precision (mAP) of 91.3, detection accuracy of 94.1, scene relationship recognition accuracy of 92.6, and a detection rate of 42 FPS, better than the traditional CNN-based and transformer-based detection frameworks and with low computational latency in real-time implementation.




