Wednesday, March 24, 2021

Six core emotional arcs determined by using NLP on 1327 stories from Project Gutenberg

Reagan, A.J., Mitchell, L., Kiley, D. et al. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Sci. 5, 31 (2016). https://doi.org/10.1140/epjds/s13688-016-0093-1

Abstract

Advances in computing power, natural language processing, and digitization of text now make it possible to study a culture’s evolution through its texts using a ‘big data’ lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories and forming patterns that are meaningful to us. Here, by classifying the emotional arcs for a filtered subset of 1,327 stories from Project Gutenberg’s fiction collection, we find a set of six core emotional arcs which form the essential building blocks of complex emotional trajectories. We strengthen our findings by separately applying matrix decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads.

Introduction

The power of stories to transfer information and define our own existence has been shown time and again [1–5]. We are fundamentally driven to find and tell stories, likened to Pan Narrans or Homo Narrativus. Stories are encoded in art, language, and even in the mathematics of physics: We use equations to represent both simple and complicated functions that describe our observations of the real world. In science, we formalize the ideas that best fit our experience with principles such as Occam’s Razor: The simplest story is the one we should trust. We tend to prefer stories that fit into the molds which are familiar, and reject narratives that do not align with our experience [6].

We seek to better understand stories that are captured and shared in written form, a medium that since inception has radically changed how information flows [7]. Without evolved cues from tone, facial expression, or body language, written stories are forced to capture the entire transfer of experience on a page. An often integral part of a written story is the emotional experience that is evoked in the reader. Here, we use a simple, robust sentiment analysis tool to extract the reader-perceived emotional content of written stories as they unfold on the page.

We objectively test aspects of the theories of folkloristics [8, 9], specifically the commonality of core stories within societal boundaries [4, 10]. A major component of folkloristics is the study of society and culture through literary analysis. This is sometimes referred to as narratology, which at its core is ‘a series of events, real or fictional, presented to the reader or the listener’ [11]. In our present treatment, we consider the plot as the ‘backbone’ of events that occur in a chronological sequence (more detail on previous theories of plot are in Appendix A in Additional file 1). While the plot captures the mechanics of a narrative and the structure encodes their delivery, in the present work we examine the emotional arc that is invoked through the words used. The emotional arc of a story does not give us direct information about the plot or the intended meaning of the story, but rather exists as part of the whole narrative (e.g., an emotional arc showing a fall in sentiment throughout a story may arise from very different plot and structure combinations). This distinction between the emotional arc and the plot of a story is one point of misunderstanding in other work that has drawn criticism from the digital humanities community [12]. Through the identification of motifs [13], narrative theories [14] allow us to analyze, interpret, describe, and compare stories across cultures and regions of the world [15]. We show that automated extraction of emotional arcs is not only possibly, but can test previous theories and provide new insights with the potential to quantify unobserved trends as the field transitions from data-scarce to data-rich [16, 17].

The rejected master’s thesis of Kurt Vonnegut - which he personally considered his greatest contribution - defines the emotional arc of a story on the ‘Beginning-End’ and ‘Ill Fortune-Great Fortune’ axes [18]. Vonnegut finds a remarkable similarity between Cinderella and the origin story of Christianity in the Old Testament (see Figure S1 in Appendix B in Additional file 1), leading us to search for all such groupings. In a recorded lecture available on YouTube [19], Vonnegut asserted:

‘There is no reason why the simple shapes of stories can’t be fed into computers, they are beautiful shapes.’

For our analysis, we apply three independent tools: matrix decomposition by singular value decomposition (SVD), supervised learning by agglomerative (hierarchical) clustering with Ward’s method, and unsupervised learning by a self-organizing map (SOM, a type of neural network). Each tool encompasses different strengths: the SVD finds the underlying basis of all of the emotional arcs, the clustering classifies the emotional arcs into distinct groups, and the SOM generates arcs from noise which are similar to those in our corpus using a stochastic process. It is only by considering the results of each tool in support of each other that we are able to confirm our findings.

We proceed as follows. We first introduce our methods in Section 2, we then discuss the combined results of each method in Section 3, and we present our conclusions in Section 4. A graphical outline of the methodology and results can be found as Figure S2 in Appendix B in Additional file 1. 

 * * * * *

Read the rest at the link.

No comments:

Post a Comment