In this study, we developed a human attention model for smooth human-robot interaction. The model consists of the saliency map generation module and manipulation map gen- eration module. The manipulation map describes top-down factors, such as human face, hands and gaze in the input image. To evaluate the proposed model, we applied the model to a magic video, and measured human gaze points during watching the video. Based on the experimental results, the proposed model can better explain human attention than the saliency map.