Philip A. Kragel, Marianne C. Reddan, Kevin S. LaBar, and Tor D. Wager, Emotion schemas are embedded in the human visual system, Science, Volume 5(7):eaaw4358, July 24, 2019, DOI: 10.1126/sciadv.aaw4358.
Abstract: Theorists have suggested that emotions are canonical responses to situations ancestrally linked to survival. If so, then emotions may be afforded by features of the sensory environment. However, few computational models describe how combinations of stimulus features evoke different emotions. Here, we develop a convolutional neural network that accurately decodes images into 11 distinct emotion categories. We validate the model using more than 25,000 images and movies and show that image content is sufficient to predict the category and valence of human emotion ratings. In two functional magnetic resonance imaging studies, we demonstrate that patterns of human visual cortex activity encode emotion category–related model output and can decode multiple categories of emotional experience. These results suggest that rich, category-specific visual features can be reliably mapped to distinct emotions, and they are coded in distributed representations within the human visual system.
From the discussion:
We found that human ratings of pleasantness and excitement evoked by images can be accurately modeled as a combination of emotion-specific features (e.g., a mixture of features related to disgust, horror, sadness, and fear is highly predictive of unpleasant arousing experiences). Individuals may draw from this visual information when asked to rate images. The presence of emotion-specific visual features could activate learned associations with more general feelings of valence and arousal and help guide self-report. It is possible that feelings of valence and arousal arise from integration across feature detectors or predictive coding about the causes of interoceptive events (48). Rather than being irreducible (49), these feelings may be constructed from emotionally relevant sensory information, such as the emotion-specific features we have identified here, and previous expectations of their affective significance. This observation raises the possibility that core dimensions of affective experience, such as arousal and valence, may emerge from a combination of category-specific features rather than the other way around, as is often assumed in constructivist models of emotion.
In addition to our observation that emotion-specific visual features can predict normative ratings of valence and arousal, we found that they were effective at classifying the genre of cinematic movie trailers. Moreover, the emotions that informed prediction were generally consistent with those typically associated with each genre (e.g., romantic comedies were predicted by activation of romance and amusement). This validation differed from our other two image-based assessments of EmoNet (i.e., testing on holdout videos from the database used for training and testing on IAPS images) because it examined stimuli that are not conventionally used in the laboratory but are robust elicitors of emotional experience in daily life. Beyond hinting at real-world applications of our model, integrating results across these three validation tests serves to triangulate our findings, as different methods (with different assumptions and biases) were used to produce more robust, reproducible results.