In the realm of robot autonomous navigation and human-robot interaction, human gaze information offers subtle yet rich clues. Enabling robots to access human gaze information allows them to predict human intentions or goals, thereby enhancing the human-robot interaction experience and even enabling robots to move smoothly in complex human environments. This study proposes a method for real-time estimation of human gaze targets in the surrounding environment using a 360-degree panoramic camera. We employ deep learning techniques to track human gaze in real-time from the robot’s perspective. Building upon previous work, our method utilizes a dual-path prediction network to address the two gaze modes of human subjects in panoramic images. We have improved the network structure by leveraging the gaze heatmap as an interpretable network component, combined with the scene saliency heatmap, to estimate people’s gaze targets. We emphasize the challenges of real-time gaze tracking on robots and demonstrate our model’s performance improvements in gaze estimation.