Skip to main content
Article Open Access

Unheard melodies and emotional peaks in Let It Go and Show Yourself: a multimodal sentiment analysis

  • Nashwa Elyamany is an associate professor of applied linguistics with a proven record of academic achievements, professional development, and intercultural communication. She has been a certified IELTS speaking examiner for over 12 years, served as the Head of Languages Department and Associate Dean for Training and Community Service and is currently Associate Dean of Graduate Studies and Scientific Research at the College of Language & Communication, Arab Academy for Science, Technology & Maritime Transport (AASTMT), Smart Village, Egypt. She is interested in a wide array of interdisciplinary research projects in light of solid academic background and extensive coursework in areas of specialization. Recent publications include a multiplicity of genres incorporating diverse theories of Cognitive Linguistics and Stylistics, Sociolinguistics, Social Semiotics, Forensic Linguistics, Digital Humanities, Computer Vision, and Natural Language Processing. She published over 40 research works, guest edited two special issues, wrote two Cambridge Elements, and received over a dozen scientific publication awards for several papers published in Visual Communication, Visual Studies, International Journal of Legal Discourse, AI & Society, Discourse & Society, Social Semiotics, Multimodal Communication, Language and Semiotic Studies, Southern African Linguistics and Applied Language Studies, Cogent Arts & Humanities, The Social Science Journal, Convergence, Sociology, among others.

    ORCID logo EMAIL logo
    ,

    Yasser Omar Youssef is a seasoned academic and AI consultant with a rich background in computer science and biomedical engineering. He holds a Ph.D. in Biomedical Engineering & Systems from the University of Cairo. Currently a faculty member at Oklahoma University’s School of Library and Information Studies. He has extensive teaching experience, offering courses in data science, machine learning, and computer security, among others. His research interests include machine learning, deep learning, data analytics, and medical image processing. He has contributed to numerous publications and conferences, enhancing his field’s knowledge base.

    and

    Manar Mohamed Hafez’s recent research focuses primarily on deep learning, AI, big data analytics, and software engineering. She completed her PhD in Information Technology and Communications at the University of Vigo, Spain, in 2021. She earned her master’s degree in Information Systems from the Arab Academy for Science, Technology, and Maritime Transport (AASTMT) in Cairo, Egypt, in 2017, and her bachelor’s degree in Software Engineering from the same institution in 2013.

    ORCID logo
Published/Copyright: August 21, 2025

Abstract

This research endeavor presents a rigorous interdisciplinary investigation into the emotional and narrative functions of “Let It Go” (Frozen) and “Show Yourself” (Frozen II), addressing a gap in scholarship that primarily relies on qualitative interpretations. Employing a novel multimodal approach, the study integrates tools from Natural Language Processing (NLP) and Computer Vision to analyze the complex interplay of lyrical, musical, and visual elements. Recognizing the inherently multimodal nature of film and music, it moves beyond traditional unimodal sentiment analysis, which often simplifies emotional complexity. More specifically, it introduces a novel multimodal framework for music sentiment analysis, integrating textual analysis (using BERT and VADER), visual analysis (using Facial Emotion Recognition), and auditory analysis (using Music Information Retrieval techniques). This synergetic approach affords a more holistic understanding of emotional expression than unimodal methods, addressing limitations of existing categorical and dimensional sentiment analysis approaches. Further, it provides a quantitative analysis of emotional trajectories within the selected Frozen songs, complementing existing qualitative interpretations focused on themes of gender, empowerment, and self-discovery.

1 Introduction

Music serves as a powerful medium for emotional expression and elicitation. Fundamental musical elements, namely melody, rhythm, harmony, and timbre, trigger psychological responses, manifesting as distinct emotions and moods. As Yang et al. (2024) observe, music, as a multimodal information carrier, effectively conveys emotions through lyrics, melodies, and structural composition, creating a strong resonance with listeners. This inherent connection between music and emotion has been the subject of extensive scholarly inquiry. Early investigations explored this relationship through psychological frameworks. Juslin and Laukka (2003) examined the intricate interplay between emotional content and musical composition, analyzing how musical structures convey specific emotional states. Zentner et al. (2008) further focused on the universal appeal of music and its capacity for emotional gratification, conducting studies with diverse listener groups. In a parallel vein, Deutsch (2013) provided an in-depth analysis of how specific musical features influence individual and societal emotions, and Koelsch (2015) identified key principles governing music’s emotional impact, encompassing evaluation, resonance, memory, and social functions. Taruffi et al. (2017) went further investigating the correlation between musical attributes and the accurate recognition of core emotions (happiness, sadness, tenderness, fear, and anger), reinforcing the established link between music and sentiment.

The advent of opinion mining and sentiment analysis (Pang and Lee 2008; Sharmin and Chakma 2021; Stypinska 2023; Vicari and Gaspari 2021) has provided computational tools for understanding human emotions at scale. Techniques such as text categorization (Pang and Lee 2004) and support vector machines (Mullen and Collier 2004) have been adapted to address similar challenges in various domains. Within music research, efforts have focused on predicting song genres (Oramas et al. 2016) and classifying music mood using textual information from websites, tags, and lyrics (Hu et al. 2009), as well as multimodal approaches combining text and audio data (Hu and Downie 2010; Kim et al. 2010). Lyrical content, in particular, has emerged as a valuable resource for exploring human experiences, societal narratives, and the emotional dimensions embedded within musical compositions (Hu and Downie 2010; Hu et al. 2009; Liu 2012). Overall, the integration of Natural Language Processing (NLP) with musicology offers a powerful approach to deciphering the complex relationship between language, emotion, and artistic expression (Alm et al. 2005). Against this backdrop, the current research endeavor affords a novel approach to understanding the emotional landscape of song lyrics by employing a multimodal sentiment analysis framework that integrates NLP and computer vision techniques (Liu et al. 2024; McFee et al. 2012). While previous studies have explored textual or audio features independently, this study addresses a critical gap by combining these modalities. Specifically, the study:

  1. Employs a multimodal model that leverages both textual information extracted through NLP and visual cues to capture a more comprehensive understanding of the emotional context;

  2. Delves into the thematic content of lyrics, beyond sentiment analysis, identifying recurring themes and narratives within musical compositions for a deeper understanding of the emotional messages conveyed; and

  3. Provides empirical validation of the proposed approach by applying computational models to song lyrics, demonstrating their effectiveness in identifying both dominant emotional tones (e.g., joyous, melancholic, or contemplative) and underlying themes (Gore et al. 2023a, 2023b, 2023c, 2023d).

This interdisciplinary methodology, bridging linguistics, computer science, and musicology, offers a more holistic perspective on musical expression. The insights derived from this analysis have significant implications for various applications, including enhancing music recommendation systems by incorporating nuanced emotional understanding, informing cultural analytics by providing quantitative measures of emotional trends in music, and benefiting industries like entertainment, media, and education by offering deeper insights into audience engagement. By advancing sentiment analysis and thematic recognition through the combined power of NLP and computer vision, this research illuminates the complex emotional contours of music lyrics and contributes to a richer understanding of musical expression (Yang 2021).

2 Contribution of the study

Sentiment analysis mines attitudes, emotions, and appraisals about various subjects as well as interpreting emotions in unstructured text as positive, negative, or neutral, quantifying their intensity (Khatua et al. 2020; Sánchez-Rada and Iglesias 2019). Audience resonance with a narrative is partially determined by the emotional journey it provides (Chu and Roy 2017). Semantic sentiment analysis is a crucial component of NLP, providing fundamental methodologies for textual content analysis (Fujiia et al. 2023; Gupta Maurya and Kumar Jha 2024). Within music sentiment analysis, two primary emotion classification approaches derived from psychological research exist: categorical and dimensional. The categorical approach classifies emotions into discrete categories based on basic emotion theory, which proposes a limited set of primary emotions (e.g., happiness, sadness, anger, fear, or disgust) from which all others originate (Oatley and Johnson-Laird 1987). The MIREX Audio Mood Classification task exemplifies this, organizing emotions into adjective-associated clusters (e.g., passionate, rollicking, literate, humorous, or aggressive) (Hu et al. 2008). However, this approach simplifies the complexity of human emotion and is susceptible to linguistic ambiguity (Yang and Chen 2011). Conversely, the dimensional approach represents emotions as points within a multidimensional space, commonly using Russell’s circumplex model of affect (Russell 1980), which uses valence (pleasantness) and arousal (intensity) as primary dimensions. While offering a straightforward comparison method, its simplicity may not capture the full emotional spectrum (Yang and Chen 2011). As Chen et al. (2024) state, “Multimodal sentiment analysis… necessitates the integration of various data modalities for accurate human emotion interpretation” (p. 1). Similarly, Bordoloi and Biswas (2023) note flaws in existing models, including domain dependence, negation management, high dimensionality, and inefficient keyword extraction. Our study addresses these limitations by introducing a novel framework integrating audio, text, and image analysis to enhance music sentiment analysis, contributing to the growing field of multimodal sentiment analysis (see Lai et al. (2023) and Das and Doren Singh (2024) for reviews, and Subbaiah et al. (2024) for recent works on efficient methods).

By integrating NLP and computer vision, the current study leverages multiple modalities for a more robust and nuanced analysis of musical emotional content. Furthermore, it addresses a specific gap in media analysis, particularly concerning Disney’s cultural productions. While Disney’s works have been critically analyzed regarding cultural commodification and representation (Rudloff 2016), and Frozen has received attention for its feminist and postfeminist themes (Fonneland 2020; Kvidal-Røvik and Cordes 2022), much of this analysis is qualitative. A lack of quantitative understanding exists regarding the emotional interplay between lyrics, music, and visuals. This said, another key contribution of this research work is the application of NLP tasks (BERT and VADER) and computer vision (FER) to quantify and map emotional peaks in two Frozen songs, examining how modalities interact to enhance emotional experiences in animated films, advancing theoretical and methodological contributions to children’s media studies. This integration allows for quantifiable visualization of emotional intensity, deepening our understanding of musical and visual interactions (Cîrneanu et al. 2023).

From a different angle, while this study primarily focuses on multimodal sentiment analysis in the context of film music and children’s media, it offers several potential contributions to Digital Forensic Linguistics by virtue of expanding the scope of linguistic evidence beyond traditional text (Coulthard et al. 2010) to encompass the emotional content conveyed through music and visuals, recognizing the increasing prevalence of multimodal digital communication (Baldry and Thibault 2006). This is particularly relevant in digital environments where videos, memes, and multimedia presentations are commonplace. Forensic linguists have shown increasing interest in analyzing emotions in digital communication for insights into intent, deception, and threats. The study’s methodology for quantifying emotion in multimodal content could be adapted for forensic analysis of threatening videos or online harassment, offering a more comprehensive understanding of these communicative acts. In deception detection, where research has focused on linguistic cues like passive voice (Bond and DePaulo 2006), analyzing emotional incongruity across modalities, such as discrepancies between facial micro-expressions and verbal statements, would offer a novel approach.

3 Let It Go and Show Yourself

The analysis of two pivotal musical pieces within the Frozen franchise provides a compelling case study for exploring the interplay between lyrics, music, and visuals in conveying complex emotional narratives.

Let It Go,” performed by Queen Elsa, functions as the anthem of Frozen, encapsulating her transformative journey toward self-acceptance. The song marks a critical turning point in her narrative, representing her release from years of suppressing her magical abilities due to fear and societal pressure. This repression, stemming from an accidental injury to her sister Anna, led to isolation and deepened after the loss of their parents. The uncontrolled manifestation of her powers during her coronation, which plunges the kingdom into an eternal winter, precipitates her retreat into solitude. Within this self-imposed exile, Elsa performs “Let It Go” as a powerful expression of her frustration with societal expectations and her defiant decision to embrace her true self. The song celebrates individuality and freedom, culminating in her bold declaration, “No right, no wrong, no rules for me; I’m free!” However, this initial sense of liberation is portrayed as incomplete and even fleeting. Her self-imposed exile in an ice palace, while visually striking, mirrors the emotional confinement she experienced prior to her escape. This highlights a crucial theme: freedom without meaningful connection can lead to a different form of isolation. When Anna attempts to reconnect with her, it becomes clear that true liberation is not found in isolation but through love and reconnection with others, particularly within the context of family and community. Beyond its personal narrative, “Let It Go” resonates with broader societal tensions surrounding gender roles, power dynamics, and the complexities of individualism. The song’s profound emotional depth significantly reshaped the narrative of Frozen, contributing to its status as a cultural phenomenon. Elsa’s visual transformation during the song, including a change in attire and a more confident demeanor, exposes postfeminist tensions. Her makeover, which presents a more conventionally sexualized image, conflates empowerment with traditional notions of femininity and beauty standards. While she verbally rejects societal expectations, her transformed appearance paradoxically reinforces certain patriarchal ideals, particularly as she eventually resumes her role as queen. Despite this inherent tension, “Let It Go” has reverberated deeply with marginalized communities worldwide, offering a powerful message of self-liberation and empowerment for individuals struggling with shame, societal pressures, and the desire for self-acceptance.

Show Yourself” from Frozen II builds on the themes of repression, difference, and eventual self-expression first introduced in “Let It Go,” providing a sense of narrative and thematic continuity. The song also introduces themes of Indigenous identity, as Elsa and Anna uncover their Northuldra heritage, a history deeply intertwined with themes of settler colonialism and cultural displacement. Despite initial concerns that it might be cut from Frozen II, “Show Yourself” ultimately became the film’s emotional core, effectively unifying Elsa’s character development across both films and marking her profound emotional breakthrough. “Show Yourself” represents a further critical juncture in Elsa’s character arc. As she journeys to Ahtohallan, a mythical river of memory, she discovers that the mysterious voice calling to her is a memory of her mother, Iduna. This revelation propels Elsa toward embracing her true identity and purpose as the Fifth Spirit, a bridge between the human world and the elemental spirits. The song reflects her evolution from uncertainty and self-doubt to a profound and complete acceptance of her true self. Initially, she addresses the external voice, but by the song’s conclusion, her focus has shifted inward, symbolizing her profound self-acceptance. This moment serves as both an emotional and narrative climax, solidifying her character arc across both films. Her perilous journey across the stormy waters to Ahtohallan visually symbolizes her internal journey of self-discovery, forcing her to confront long-buried truths about her family history and her own identity.

4 Methods

To facilitate a granular analysis of the emotional progression within “Let It Go” (Frozen) and “Show Yourself” (Frozen II), the songs were initially segmented into distinct narrative phases, each representing key emotional and narrative shifts (see Tables 1 and 2). This segmentation allowed for a fine-grained examination of how emotional arcs develop within each song, providing a structured framework for the subsequent computational analysis of the lyrical, visual, and auditory content.

Table 1:

Segmentation of “Let it Go” showcasing significant intersemiotic cues.

Segment Time frame Audio-visual cues Musical-lyrical cues
Introduction & resignation 0:00–0:40 Walking alone; snowstorm raging; muted colors Low, controlled vocal delivery; lyrics about concealment
Decision to let go 0:41–1:02 Removal of her gloves and throwing them away; slight color brightening Increase of tempo; lyrics’ reflection of decision
Empowerment 1:03–1:27 Creating a bridge of ice; camera zooming in on her face Music swelling; strong vocal delivery; lyrics about breaking free
Creation of ice palace 1:28–2:05 Constructing the ice palace; vibrant colors; intricate designs Instrumental build-up; lyrics’ reflection of creation and freedom
Transformation 2:06–2:37 Changing her outfit; letting down her hair; adopting a new posture Shift to major key; confident vocal tone; powerful lyrics
Final declaration 2:38–3:00 Striding confidently; looking down at her palace Triumphant musical climax; assertive lyrics
Conclusion 3:01–3:44 Standing at the balcony; calm snow falling Gentle decrescendo; lyrics’ reflection of peace and contentment
Table 2:

Segmentation of “Show Yourself” showcasing significant intersemiotic cues.

Segment Time frame Audio-visual cues Musical-lyrical cues
Introduction & anticipation 0:00–0:42 Walking through the forest; dim lighting; hesitant movements Soft and questioning tone in vocals; lyrics about seeking truth
First revelation 0:43–1:21 Environment’s slight brightening Pick-up of tempo; more assured vocals; hopeful lyrics
Search intensification 1:22–1:58 Faster moves; water and ice imagery; vibrant colors emergence Stronger vocal delivery; lyrics focus on uncovering truth
Final confrontation 1:59–2:36 Entering Ahtohallan; ethereal and glowing environment Peak of vocal intensity; direct and commanding lyrics
Self-realization 2:37–3:05 Seeing her reflection; bright and clear visuals; calm water Swelling music; powerful vocals; lyrics about embracing identity
Resolution & acceptance 3:06–3:45 Standing calmly in Ahtohallan, surrounded by light and calm water Softening music; lyrics’ reflection of peace and self-acceptance

We employed computer vision and NLP tools to analyze the emotional peaks within these songs, affording a detailed examination of lyrical sentiment and thematic progression, revealing how the lyrics contribute to and reflect the characters’ emotional journeys. Figure 1 outlines the proposed model for analyzing the two song videos, employing multiple techniques across different audio-visual and textual modalities (singer’s voice, instrumental music, visual imagery, and lyrics) to achieve a comprehensive multimodal sentiment analysis.

Figure 1: 
Conceptual analytical model of two Frozen and Frozen II songs.
Figure 1:

Conceptual analytical model of two Frozen and Frozen II songs.

4.1 Textual analysis

Sentiment analysis is crucial for understanding the polarity and intensity of human emotions expressed in text (Eyu et al. 2025). As defined by Batrinca and Treleaven (2015), it involves “the application of natural language processing, computational linguistics and text analytics to identify and extract subjective information in source materials” (p. 90). With this in mind, our textual analysis deploys two distinct sentiment analysis models, BERT and VADER, chosen for their complementary strengths and proven efficacy across diverse narrative genres (Barik and Misra 2024), further enhancing the robustness of analysis. BERT’s deep learning architecture allows for context-aware sentiment classification, making it particularly well-suited for detecting subtle shifts in emotion across the lyrical content. VADER, a lexicon-based tool, offers a complementary approach, particularly effective for handling more general sentiments often found in media contexts. A significant methodological challenge was the extraction of lyrics directly from the Frozen soundtracks due to the prominent presence of musical instrumentation. To overcome this, Optical Character Recognition (OCR) techniques were applied to available subtitles, providing a reliable source for textual extraction and ensuring the accuracy of subsequent sentiment analysis. This methodological choice demonstrates a practical approach to data acquisition in multimodal analysis. To maintain coherence with the songs’ narrative structures, a four-step process was implemented:

  1. Segmentation Alignment. The texts within each previously defined narrative segment were concatenated into a single string. This ensured that the sentiment analysis reflected the entire narrative context of each segment, capturing the overall emotional arc rather than analyzing fragmented pieces of text.

  2. Dual-Model Application. Both a BERT-based deep learning model (Hoang et al. 2019) and VADER (Catelli et al. 2022) were applied to each segment. This dual-model approach, a key methodological contribution, provided a more robust evaluation by integrating the strengths of both advanced machine learning and traditional lexicon-based sentiment analysis.

  3. Heatmap Visualization. A heatmap visualization (Samek et al. 2017) was used to represent the dominant emotions across the different segments. This visualization technique effectively highlighted variations in sentiment throughout the songs, providing a clear and interpretable representation of the emotional trajectories.

  4. OCR for Lyric Extraction. Due to the difficulty of extracting lyrics directly from the soundtracks, OCR was applied to readily available subtitles. This practical solution ensured accurate lyric extraction despite audio complexities.

4.2 Visual analysis

Our visual analysis employed Facial Emotion Recognition (FER), focusing specifically on Elsa’s facial expressions at key emotional junctures in both songs. FER’s deep learning-based architecture (Bhagat et al. 2024) is adept at detecting nuanced emotional transitions, providing a data-driven perspective on the visual manifestation of character emotions. The following detailed steps outline the visual analysis process:

  1. Segment Alignment. As with the textual analysis, the video was divided into distinct segments based on the previously defined narrative and emotional shifts. This ensured alignment between the visual, textual, and auditory analyses.

  2. Face Detection. For each segment, frames were sampled, and face detection was performed using a pretrained Haar cascade classifier. This provided a reliable method for isolating Elsa’s face within each frame.

  3. Facial Emotion Recognition. The extracted facial images were processed using the FER model, a deep Convolutional Neural Network (CNN) trained to recognize a range of human emotions (e.g., happiness, sadness, anger, surprise, or fear).

  4. Dominant Emotion Identification. For each video segment, the dominant emotion was identified by aggregating the results from individual frames, determining the emotion with the highest confidence score across all frames within that segment.

  5. Expert Validation. To ensure the accuracy and reliability of the FER model, the identified dominant emotions were compared against expert interpretations based on visual, audio, and semiotic cues. This validation step is crucial for establishing the validity of the computational analysis.

  6. Heatmap Visualization. The results were visualized using heatmaps, displaying normalized confidence scores of detected emotions across different segments, providing a clear visualization of emotional transitions.

To ensure the accuracy and reliability of the FER model, the identified dominant emotions were compared against expert interpretations based on visual, audio, and semiotic cues. Three annotators with backgrounds in film studies, musicology, and psychology participated in this process. These annotators possess expertise in analyzing emotional expression in film, allowing for a nuanced understanding of the characters’ emotional states. The annotation process involved each annotator independently rating the dominant emotion in each segment based on visual cues, audio cues, and narrative context. The emotion categories were operationally defined to ensure consistency. Inter-rater reliability was assessed using Fleiss’ kappa to quantify the agreement between annotators, demonstrating a substantial level of agreement (kappa = 0.81). Any discrepancies between FER model outputs and expert judgments were resolved through discussion and consensus among the annotators. This process ensured that the final interpretation reflected a comprehensive understanding of the emotional content. It is important to acknowledge the unique challenges of applying FER to stylized animations, where characters’ expressions may differ from real human expressions. The expert validation process helped to mitigate these challenges by providing a human-centered interpretation of the model’s output.

4.3 Auditory analysis

To provide a more holistic understanding of how musical features contribute to emotional peaks, we extracted key musical features using Music Information Retrieval (MIR) tools. The extracted features – tempo (BPM), spectral centroid (brightness), and Root Mean Square (RMS) (loudness) – are widely used in MIR for mood and emotion classification (Casey et al. 2008; Lartillot and Toiviainen 2007). The six-step process for auditory analysis was as follows:

  1. Audio Extraction and Separation. The audio track was extracted from the video and separated into vocal and accompaniment channels using tools like FFmpeg and Spleeter. This separation allowed for a more focused analysis of the musical elements.

  2. Feature Extraction. Relevant musical features (tempo, spectral centroid, RMS) were extracted from each segment using audio analysis libraries like LibROSA.

  3. Sentiment Mapping. The extracted features were mapped to specific sentiments using heuristic rules based on empirical studies in music psychology (Juslin and Laukka 2003).

  4. Result Interpretation. The results were interpreted for each segment to understand how the music influences the emotional experience.

  5. Visualization. The extracted features and corresponding sentiments were visualized using tools like Matplotlib.

  6. Evaluation and Reflection. The outcomes were evaluated to determine the effectiveness of the heuristic rules and reflect on their limitations, paving the way for future refinements using more sophisticated methods.

By means of combining these three distinct analytical approaches, this study provides a more comprehensive and nuanced understanding of the emotional landscapes of Frozen and Frozen II, directly addressing the need for multimodal analysis in understanding artistic expression. Combining computational analysis with expert validation allows for a more objective and data-driven approach to the study of emotion in media.

5 Results and discussion

5.1 Textual analysis

The analysis of “Show Yourself,” presented in Table 3 and visualized in Figure 2, using BERT and VADER, reveals interesting discrepancies and alignments between the models and expert interpretation. At the song’s outset, Elsa experiences curiosity and uncertainty, as confirmed by expert interpretation. BERT detected anger, while VADER showed neutrality. This discrepancy suggests an underlying inner conflict masked by a controlled exterior, a nuance that VADER’s more general approach missed. As Elsa reaches the first revelation, a mix of hopeful realization and apprehension emerges. Both models indicated a shift toward a more negative outlook from an initial detection of anger by BERT, reflecting Elsa’s cautious approach to the unfolding truths. During the intensified search for answers, Elsa’s determination becomes evident. BERT detected sadness, potentially reflecting frustration, while VADER remained neutral, indicating Elsa’s focused pursuit of self-discovery. The final confrontation brings a moment of truth, marked by joy and positivity, aligning with both models’ readings. This signifies Elsa’s growing understanding of her true identity. At the song’s emotional climax, Elsa’s realization is intense. BERT detected anger, while VADER remained neutral, possibly highlighting the overwhelming nature of self-discovery, which can be both exhilarating and daunting. The song concludes with peaceful acceptance as Elsa comes to terms with her identity. Both models reflected a calm and balanced state, indicating her attainment of serenity and peace. This progression, from internal conflict to acceptance, effectively encapsulates Elsa’s emotional evolution throughout the song, showcasing the potential of sentiment analysis to map character development.

Table 3:

Comparison of model results with expert analysis for Show Yourself.

Segment Time frame BERT emotion VADER emotion Emotional peak Expert interpretation
Introduction/anticipation 0:00–0:42 Anger Neutral Curiosity and uncertainty Elsa is beginning her journey, driven by curiosity but tinged with uncertainty.
First revelation 0:43–1:21 Anger Negative Hopeful realization Elsa feels closer to the answers she’s seeking, hopeful about what she might discover.
Search intensification 1:22–1:58 Sadness Neutral Rising determination Elsa is more determined, driven by a sense of destiny and the need for self-discovery.
Final confrontation 1:59–2:36 Joy Positive Moment of truth Elsa is on the verge of discovering her true self, emotionally charged and ready to face the truth.
Self-realization 2:37–3:05 Anger Neutral Emotional climax Elsa reaches the emotional peak of the song as she realizes her true identity and power.
Resolution/acceptance 3:06–3:45 Anger Neutral Peaceful acceptance Elsa has accepted her true self, the song concludes with a serene and peaceful tone.
Figure 2: 
Emotional analysis heatmap of Show Yourself.
Figure 2:

Emotional analysis heatmap of Show Yourself.

The analysis of “Let It Go,” presented in Table 4 and visualized in Figure 3, revealed both strengths and limitations of the BERT and VADER models in capturing the song’s emotional peaks. BERT frequently misinterpreted intense emotions, labeling segments like “Introduction/Resignation” and “Creation of Ice Palace” as anger, contrasting with expert interpretations of mild frustration and creative freedom, respectively. This suggests that BERT may associate strong emotional expressions with negativity, even in positive or neutral contexts. In contrast, VADER often classified segments as positive, aligning with moments of joy and transformation, such as “Decision to Let Go” and “Transformation,” reflecting Elsa’s embrace of her powers and independence. However, VADER’s simpler classification sometimes lacked nuance, failing to distinguish between different types of positive emotions like empowerment and liberation. For instance, in the “Final Declaration” segment, VADER’s positive sentiment aligns with liberation but does not capture the moment’s full complexity. BERT’s misclassification of fear highlights its difficulty distinguishing high-energy emotions associated with fear and liberation. Overall, both models effectively detected clear emotional peaks, particularly when emotions were unambiguously positive. However, they struggled with more complex or mixed emotional states. BERT’s detailed labels provided specificity but at the risk of misclassification, while VADER’s broader categories were consistent but lacked depth.

Table 4:

Comparison of model results with expert analysis for Let It Go.

Segment Time frame BERT emotion VADER emotion Emotional peak Expert interpretation
Introduction/resignation 0:00–0:40 Anger Positive Mild frustration/resignation Elsa is still bound by her fears and responsibilities, reflecting her inner struggle.
Decision to let go 0:41–1:02 Joy Positive Rising determination Elsa begins to shed her old identity, embracing her powers and independence.
Empowerment 1:03–1:27 Anger Negative Empowerment Elsa fully embraces her powers, symbolizing a major emotional shift toward self-empowerment.
Creation of ice palace 1:28–2:05 Anger Positive Creative freedom Elsa channels her powers into something beautiful, symbolizing her control and creativity.
Transformation 2:06–2:37 Joy Positive Self-realization/acceptance Elsa accepts her true self, embracing her identity fully.
Final declaration 2:38–3:00 Fear Positive Liberation Elsa’s final emotional peak, symbolizing complete liberation and independence from societal norms.
Conclusion 3:01–3:44 Joy Positive Calm confidence Elsa is now at peace, having fully embraced her powers and identity.
Figure 3: 
Emotional analysis heatmap of Let It Go.
Figure 3:

Emotional analysis heatmap of Let It Go.

The analysis, as detailed in Tables 3 and 4 and Figures 2 and 3, reveals that both BERT and VADER have limitations in representing emotions that are not clearly positive or negative. BERT showed potential for recognizing distinct emotional peaks but requires refinement to handle nuanced emotions. VADER’s neutrality suggests its utility for general sentiment analysis, but its lack of specificity limits in-depth analysis. To improve sentiment analysis in multimedia narratives, both models would benefit from further training on specialized datasets that capture the nuanced emotional transitions characteristic of storytelling. This would enhance their ability to accurately interpret a broader range of emotions. Integrating techniques focusing on contextual and sequential emotional analysis, such as attention mechanisms, could further improve alignment with expert interpretations, making sentiment analysis more relevant and insightful for multimodal analysis. This directly supports the need for the proposed multimodal model, which aims to address these very limitations.

5.2 Visual analysis

In the analysis of “Show Yourself,” the FER model was employed to detect and quantify the emotional peaks within the song, aligning these findings with the narrative structure and Elsa’s evolving self-awareness (see Table 5 and Figure 4). In the Introduction (0:00 - 0:42), characterized by anticipation, the FER model detected a high confidence level associated with curiosity and uncertainty, which aligns with Elsa’s cautious approach as she embarks on her journey. During the First Revelation (0:43–1:21), where hopefulness emerges, the FER model recorded emotions that match Elsa’s growing sense of optimism about uncovering her past, supported by a high confidence score of 0.9500. As the search intensifies (1:22–1:58), the FER model noted a slight dip in confidence, at 0.9300, which still reflects Elsa’s rising determination driven by her intrinsic need for self-discovery. In the Final Confrontation (1:59–2:36), marked as the moment of truth, the FER model recorded a confidence score of 0.9200, capturing the emotional intensity as Elsa prepares to face her reality. The Emotional Climax (2:37–3:05) represents Elsa’s peak of self-realization, with the FER model detecting a return to higher confidence levels (0.9300), indicating her deep emotional resonance upon realizing her true identity. Finally, in the Resolution and Acceptance segment (3:06–3:45), the FER model recorded the highest confidence score of 0.9800, reflecting Elsa’s serene acceptance of her identity and the song’s peaceful conclusion. The strong alignment between the FER model’s findings and expert analysis demonstrates the model’s capability to accurately mirror the emotional transitions and depth portrayed in “Show Yourself.” By synchronizing audio-visual cues with expert semiotic analysis, this approach provides a comprehensive understanding of Elsa’s character evolution, emphasizing the effectiveness of deep learning models in uncovering nuanced emotional insights within narrative-driven content. The analysis illustrates how the FER model effectively complements expert interpretations, capturing both the subtle and significant emotional shifts as Elsa journeys through self-discovery. The model’s consistently high confidence levels across the song’s segments reflect its ability to mirror the character’s psychological landscape and narrative-driven emotional peaks.

Table 5:

Visual analysis of Show Yourself.

Segment Time frame FER detected emotion FER confidence Emotional peak Expert interpretation
Introduction/anticipation 0:00–0:42 Anticipation 0.9600 Curiosity & uncertainty Elsa is beginning her journey, driven by curiosity but tinged with uncertainty.
First revelation 0:43–1:21 Hope 0.9500 Hopeful realization Elsa feels closer to the answers she’s seeking, hopeful about what she might discover.
Search intensification 1:22–1:58 Determination 0.9300 Rising determination Elsa is more determined, driven by a sense of destiny and the need for self-discovery.
Final confrontation 1:59–2:36 Truth 0.9200 Moment of truth Elsa is on the verge of discovering her true self, emotionally charged and ready to face the truth.
Self-realization 2:37–3:05 Climax 0.9300 Emotional climax Elsa reaches the emotional peak of the song as she realizes her true identity and power.
Resolution/acceptance 3:06–3:45 Acceptance 0.9800 Peaceful acceptance Elsa has accepted her true self, the song concludes with a serene and peaceful tone.
Figure 4: 
Emotional peak with normalized confidence of Show Yourself using FER.
Figure 4:

Emotional peak with normalized confidence of Show Yourself using FER.

In the analysis of “Let It Go,” also visualized in Figure 5 and detailed in Table 6, the FER model was utilized to identify and quantify the emotional peaks within the song, aligning these results with the narrative progression and Elsa’s character development. In the Introduction (0:00–0:40), the detected emotions indicated mild frustration and resignation, aligning with Elsa’s initial struggle and fears. As Elsa decides to let go (0:41–1:02), the model captured rising determination, a transition that marks her shedding of past constraints. This shift is further underscored in the Empowerment segment (1:03–1:27), where the model’s high confidence scores reflect Elsa’s full embrace of her powers, symbolizing a profound self-empowerment. The creation of the Ice Palace (1:28–2:05) was marked by emotions associated with creative freedom, as Elsa transforms her environment, embodying control and artistry. The Transformation segment (2:06–2:37) showed emotions of happiness, corresponding to Elsa’s self-realization and acceptance of her identity, while the Final Declaration (2:38–3:00) captured feelings of liberation, aligning with Elsa’s complete detachment from societal norms. Finally, in the Conclusion (3:01–3:44), the analysis revealed calm confidence, correlating with Elsa’s state of peace and mastery over her powers. The normalized confidence scores from the FER model consistently peaked during these key emotional shifts, affirming the model’s capability to accurately detect and mirror the emotional depth and progression outlined by human interpretation. This synchronization between audio-visual cues and semiotic analysis offers a comprehensive understanding of how Elsa’s character evolution is visually and emotionally depicted in “Let It Go,” demonstrating the efficacy of using deep learning models in conjunction with expert analysis for nuanced emotional insight.

Figure 5: 
Emotional peak with normalized confidence of Let It Go using FER.
Figure 5:

Emotional peak with normalized confidence of Let It Go using FER.

Table 6:

Visual analysis of Let It Go.

Segment Time frame FER detected emotion FER confidence Emotional peak Expert interpretation
Introduction/resignation 0:00–0:40 Sad 0.78 Mild frustration/resignation Elsa is still bound by her fears and responsibilities, reflecting her inner struggle.
Decision to let go 0:41–1:02 Surprise 1.00 Rising determination Elsa begins to shed her old identity, embracing her powers and independence.
Empowerment 1:03–1:27 Surprise 0.92 Empowerment Elsa fully embraces her powers, symbolizing a major emotional shift toward self-empowerment.
Creation of ice palace 1:28–2:05 Surprise 0.95 Creative freedom Elsa channels her powers into something beautiful, symbolizing her control and creativity.
Transformation 2:06–2:37 Happy 0.99 Self-realization/acceptance Elsa accepts her true self, embracing her identity fully.
Final declaration 2:38–3:00 Surprise 0.98 Liberation Elsa’s final emotional peak, symbolizing complete liberation and independence from societal norms.
Conclusion 3:01–3:44 Happy 0.93 Calm confidence Elsa is now at peace, having fully embraced her powers and identity.

5.3 Auditory analysis

This section presents the results of the auditory analysis, aligning musical features and sentiment analysis with the predefined emotional peaks from the scene analyses of “Show Yourself” (Frozen II) and “Let It Go” (Frozen), as detailed in Tables 7 and 8 and visualized in Figures 6 and 7, respectively. This combined approach, integrating audio analysis with the previously discussed textual and visual analyses, provides a richer understanding of how musical elements correlate with the narrative’s emotional flow and contributes to the overall multimodal sentiment analysis.

Table 7:

Auditory analysis of Show Yourself: musical features, sentiment analysis, and expert interpretation.

Segment Time frame Tempo Spectral centroid RMS (loudness) Sentiment Emotional peak Expert interpretation
Introduction/anticipation 0:00–0:42 0.91 0.71 0.0000 Tense/anxious Curiosity & uncertainty Elsa is beginning her journey, driven by curiosity but tinged with uncertainty.
First revelation 0:43–1:21 0.00 0.00 0.0606 Calm/sad Hopeful realization Elsa feels closer to the answers she’s seeking, hopeful about what she might discover.
Search intensification 1:22–1:58 1.00 0.06 0.3827 Neutral Rising determination Elsa is more determined, driven by a sense of destiny and the need for self-discovery.
Final confrontation 1:59–2:36 1.00 1.00 0.7487 Energetic/happy Moment of truth Elsa is on the verge of discovering her true self, emotionally charged and ready to face the truth.
Self-realization 2:37–3:05 1.00 0.74 0.6066 Energetic/happy Emotional climax Elsa reaches the emotional peak of the song as she realizes her true identity and power.
Resolution/acceptance 3:06–3:45 1.00 0.81 1.0000 Energetic/happy Peaceful acceptance Elsa has accepted her true self, the song concludes with a serene and peaceful tone.
Table 8:

Auditory analysis of Let It Go: musical features, sentiment analysis, and expert interpretation.

Segment Time frame Tempo Spectral centroid RMS (loudness) Sentiment Emotional peak Expert interpretation
Introduction/resignation 0:00–0:40 0.31 0.16 0.0000 Neutral Mild frustration & resignation Elsa is still bound by her fears and responsibilities, reflecting her inner struggle.
Decision to let go 0:41–1:02 0.00 0.00 0.1604 Calm/sad Rising determination Elsa begins to shed her old identity, embracing her powers and independence.
Empowerment 1:03–1:27 1.00 1.00 0.2739 Tense/anxious Empowerment Elsa fully embraces her powers, symbolizing a major emotional shift toward self-empowerment.
Creation of ice palace 1:28–2:05 1.00 0.64 0.5899 Energetic/happy Creative freedom Elsa channels her powers into something beautiful, symbolizing her control and creativity.
Transformation 2:06–2:37 1.00 0.92 0.9963 Energetic/happy Self-realization & acceptance Elsa accepts her true self, embracing her identity fully.
Final declaration 2:38–3:00 1.00 0.35 1.0000 Neutral Liberation Elsa’s final emotional peak, symbolizing complete liberation and independence from societal norms.
Conclusion 3:01–3:44 1.00 0.54 0.8368 Energetic/happy Calm confidence Elsa is now at peace, having fully embraced her powers and identity.
Figure 6: 
Normalized musical features per segment of Show Yourself.
Figure 6:

Normalized musical features per segment of Show Yourself.

Figure 7: 
Normalized musical features per segment of Let It Go.
Figure 7:

Normalized musical features per segment of Let It Go.

The analysis of “Show Yourself,” presented in Table 7 and visualized in Figure 6, aligns the extracted musical features with the emotional narrative described by expert interpretation. Each segment’s sentiment is derived from the normalized values of key musical features such as tempo, spectral centroid, and RMS (loudness), reflecting the emotional shifts within the song. In the “Intro/Anticipation” segment, the high tempo and spectral centroid suggest a feeling of tension and anticipation, fitting with Elsa’s curiosity and uncertainty as she begins her journey. The second “First Revelation” segment is marked by low tempo and spectral centroid, with a slight increase in loudness, leading to a “Calm/Sad” sentiment. This aligns with Elsa’s hopeful yet reflective state as she inches closer to understanding her inner voice. By the “Search Intensifies” segment, despite the high tempo, the low spectral centroid and moderate loudness lead to a neutral sentiment. This reflects Elsa’s rising determination and the growing intensity of her search, driven by a sense of destiny. In the “Final Confrontation” segment, with high values across all features, the segment conveys an “Energetic/Happy” sentiment, aligning with the emotional charge of Elsa facing her moment of truth. This segment represents a pivotal point where Elsa is ready to confront and embrace her true self. High tempo, spectral centroid, and RMS continue to drive an “Energetic/Happy” sentiment in the “Emotional Climax” segment, fitting with the emotional climax of the song where Elsa fully realizes her identity and powers. The “Resolution/Acceptance” segment maintains high values across all features, conveying a sentiment of “Energetic/Happy,” which correlates with Elsa’s peaceful acceptance of herself. The conclusion of the song mirrors a serene and fulfilled emotional state.

Table 8 and Figure 7 illustrate a strong alignment between the musical features, sentiment analysis, and the narrative’s emotional arc in “Let It Go.” In the “Introduction/Resignation” segment, the low tempo, low spectral centroid, and minimal loudness result in a neutral sentiment, matching the scene where Elsa is still conflicted, constrained by fear and responsibility. The emotional interpretation of “Mild Frustration/Resignation” reflects this internal struggle. With low tempo and spectral centroid, and a slightly higher loudness, the “Decision to Let Go” segment’s sentiment is “Calm/Sad,” aligning with Elsa’s tentative step toward embracing her powers, characterized as “Rising Determination.” In the “Empowerment” segment, a high tempo, high spectral centroid, and moderate loudness contribute to a “Tense/Anxious” sentiment. This correlates with Elsa’s empowering moment where she fully embraces her abilities, mirroring the emotional peak described as “Empowerment.” By the “Creation of Ice Palace” phase, high values across tempo, spectral centroid, and RMS indicate an “Energetic/Happy” sentiment, fitting the scene of Elsa’s artistic expression with her ice powers, encapsulated by “Creative Freedom.” The “Transformation” segment maintains high tempo, spectral centroid, and loudness, leading to an “Energetic/Happy” sentiment. It aligns with Elsa’s acceptance of her true self, which the experts interpret as “Self-Realization/Acceptance.” Although the tempo remains high in the “Final Declaration” segment, the spectral centroid drops, and loudness reaches its peak, resulting in a neutral sentiment. This phase of the song represents Elsa’s liberation, which is echoed in the emotional peak “Liberation.” By the “Conclusion” segment, consistent high values in tempo and spectral centroid, along with slightly reduced loudness, evoke an “Energetic/Happy” sentiment, aligning with the peace and confidence Elsa has found, as described in “Calm Confidence.”

This combined analysis demonstrates how musical features can effectively map to specific emotional peaks, offering valuable insights into the emotional impact of a song within a narrative context (see Figures 6 and 7). The heuristic rules applied to derive sentiments from musical features provide a reliable method for understanding the emotional resonance of the song, demonstrating the power of music in enhancing storytelling. The findings emphasize the alignment between the audio analysis and the emotional journey depicted in both “Show Yourself” and “Let It Go,” supporting the narrative through carefully crafted musical elements. The application of these heuristic rules, grounded in music psychology and information retrieval research, confirms their validity and relevance in sentiment analysis of musical compositions.

6 Conclusions

This study offers a rigorous interdisciplinary investigation into the emotional and narrative functions of “Let It Go” (Frozen) and “Show Yourself” (Frozen II), addressing a gap in existing scholarship that often relies on qualitative interpretations. By integrating NLP, computer vision, and MIR, our multimodal sentiment analysis provides novel insights into the songs’ emotional arcs and thematic structures. Our findings reveal distinct emotional trajectories aligned with narrative shifts. In “Let It Go,” the transition from isolation to empowerment is marked by emotional peaks identified across modalities. The textual analysis reveals strong sentiments of joy and liberation. The facial expression recognition (FER) detects increased expressions of confidence and happiness. Conversely, “Show Yourself” presents a more gradual emotional build-up, culminating in a reflective climax. The auditory analysis corroborates these patterns, with musical features aligning closely with narrative developments.

The integration of textual, visual, and auditory modalities underscores their synergistic contribution to emotional engagement. FER models performed well in capturing emotional transitions, while NLP tools offered nuanced detection of complex emotions. Nonetheless, model refinement remains necessary. BERT occasionally misclassifies emotions (e.g., mistaking determination for anger) indicating the need for domain-specific fine-tuning. VADER’s tendency toward neutrality further suggests the value of alternative models like RoBERTa or DistilBERT and ensemble approaches combining lexicon-based and deep learning methods. Applying FER to animated characters also presents challenges. Validation through expert human annotation and comparison with models such as OpenFace or Affectiva could mitigate biases, especially those arising from discrepancies between animated and real human expressions. Similarly, while our MIR analysis effectively linked musical features to emotional arcs, future studies could improve precision by incorporating deeper musical elements such as harmonic complexity, key modulation, and phrase structure potentially via deep learning-based MIR techniques.

Our selection of “Let It Go” and “Show Yourself” reflects their narrative significance and emotional complexity. However, future research should expand to a broader corpus of Disney songs or animated musicals to enhance generalizability. Multimodal sentiment analysis also holds potential beyond film and musicology, with applications in forensic linguistics (e.g., deception detection), media studies (e.g., modeling audience engagement), and cross-cultural emotion research. For instance, cross-cultural studies could illuminate how emotional interpretations of music and film vary across audiences and contexts. This research work also invites critical reflections on the ethical and representational dimensions of the analyzed texts. While “Let It Go” is celebrated as an anthem of self-liberation, Elsa’s visual transformation raises questions about the conflation of empowerment with conventional beauty standards, echoing postfeminist critiques in children’s media. Similarly, “Show Yourself” introduces the Northuldra people, prompting scrutiny of potential stereotyping and cultural appropriation. The portrayal of Indigenous-inspired cultures and their commodification in merchandise risk perpetuating simplified or distorted narratives, reinforcing colonial logics and eliciting discomfort or alienation among viewers from those communities. Furthermore, our use of AI-driven sentiment analysis warrants ethical consideration. FER systems often rely on limited emotion taxonomies and training datasets that underrepresent global emotional diversity, potentially resulting in biased or inaccurate emotion recognition. The reduction of complex affective states to basic categories (e.g., joy and sadness) risks oversimplifying nuanced emotional experiences.

Although this study provides valuable insights into the emotional dynamics of Frozen and Frozen II, its findings are context-specific and may not generalize across genres or narrative forms. Future research should explore more diverse musical and cinematic datasets to develop robust, cross-domain sentiment analysis models. In advancing multimodal sentiment analysis, future work should address limitations in current NLP tools by integrating ensemble models, combining architectures like RoBERTa and DistilBERT, and exploring cross-modal attention mechanisms with visual models such as ResNet. Fine-tuning on annotated musical corpora could enhance emotion classification in this domain. Similarly, MIR research should evolve to capture more sophisticated musical features that align with emotional narratives, enabling richer, more accurate modeling of affect in multimedia contexts.


Corresponding author: Nashwa Elyamany, Arab Academy for Science, Technology & Maritime Transport, Smart Village, Giza, Egypt, E-mail:

About the authors

Nashwa Elyamany

Nashwa Elyamany is an associate professor of applied linguistics with a proven record of academic achievements, professional development, and intercultural communication. She has been a certified IELTS speaking examiner for over 12 years, served as the Head of Languages Department and Associate Dean for Training and Community Service and is currently Associate Dean of Graduate Studies and Scientific Research at the College of Language & Communication, Arab Academy for Science, Technology & Maritime Transport (AASTMT), Smart Village, Egypt. She is interested in a wide array of interdisciplinary research projects in light of solid academic background and extensive coursework in areas of specialization. Recent publications include a multiplicity of genres incorporating diverse theories of Cognitive Linguistics and Stylistics, Sociolinguistics, Social Semiotics, Forensic Linguistics, Digital Humanities, Computer Vision, and Natural Language Processing. She published over 40 research works, guest edited two special issues, wrote two Cambridge Elements, and received over a dozen scientific publication awards for several papers published in Visual Communication, Visual Studies, International Journal of Legal Discourse, AI & Society, Discourse & Society, Social Semiotics, Multimodal Communication, Language and Semiotic Studies, Southern African Linguistics and Applied Language Studies, Cogent Arts & Humanities, The Social Science Journal, Convergence, Sociology, among others.

Yasser Omar Youssef

Yasser Omar Youssef is a seasoned academic and AI consultant with a rich background in computer science and biomedical engineering. He holds a Ph.D. in Biomedical Engineering & Systems from the University of Cairo. Currently a faculty member at Oklahoma University’s School of Library and Information Studies. He has extensive teaching experience, offering courses in data science, machine learning, and computer security, among others. His research interests include machine learning, deep learning, data analytics, and medical image processing. He has contributed to numerous publications and conferences, enhancing his field’s knowledge base.

Manar Mohamed Hafez

Manar Mohamed Hafez’s recent research focuses primarily on deep learning, AI, big data analytics, and software engineering. She completed her PhD in Information Technology and Communications at the University of Vigo, Spain, in 2021. She earned her master’s degree in Information Systems from the Arab Academy for Science, Technology, and Maritime Transport (AASTMT) in Cairo, Egypt, in 2017, and her bachelor’s degree in Software Engineering from the same institution in 2013.

  1. Conflict of interest: There is no conflict of interest.

  2. Research funding: No fund was provided for the current work.

References

Alm, Cecilia Ovesdotter Alm, Dan Roth & Richard Sproat. 2005. Emotions from text: Machine learning for text-based emotion prediction. In Proceedings of human language Technology Conference and Conference on empirical Methods in Natural Language Processing, 579–586. Vancouver, British Columbia, Canada: Association for Computational Linguistics. https://aclanthology.org/H05-1073.10.3115/1220575.1220648Search in Google Scholar

Baldry, Anthony & Paul J. Thibault. 2006. Multimodal transcription and text analysis: A multimedia toolkit and coursebook. London: Continuum.Search in Google Scholar

Barik, Kousik & Sanjay Misra. 2024. Analysis of customer reviews with an improved VADER lexicon classifier. Journal of Big Data 11(10). https://doi.org/10.1186/s40537-023-00861-x.Search in Google Scholar

Batrinca, Bogdan & Philip C. Treleaven. 2015. Social media analytics: A survey of techniques, tools and platforms. AI & Society 30. 89–116. https://doi.org/10.1007/s00146-014-0549-4.Search in Google Scholar

Bhagat, Dhvanil, Abhi Vakil, Rajeev Kumar Gupta & Abhijit Kumar. 2024. Facial emotion recognition (FER) using convolutional neural network (CNN). Procedia Computer Science 235(2024). 2079–2089. https://doi.org/10.1016/j.procs.2024.04.197.Search in Google Scholar

Bond, Charles F.Jr. & Bella M. DePaulo. 2006. Accuracy of deception judgments. Personality and Social Psychology Review 10(3). 214–234. https://doi.org/10.1207/s15327957pspr1003_2.Search in Google Scholar

Bordoloi, Monali & Saroj Kumar Biswas. 2023. Sentiment analysis: A survey on design framework, applications and future scopes. Artificial Intelligence Review 56. 12505–12560. https://doi.org/10.1007/s10462-023-10442-2.Search in Google Scholar

Casey, Michael A., Remco Veltkamp, Masataka Goto, Marc Leman, Christophe Rhodes & Malcolm Slaney. 2008. Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE 96(4). 668–696. https://doi.org/10.1109/JPROC.2008.916370.Search in Google Scholar

Catelli, Rosario, Serena Pelosi & Massimo Esposito. 2022. Lexicon-based vs. Bert-based sentiment analysis: A comparative study in Italian. Electronics 11(3). 374. https://doi.org/10.3390/electronics11030374.Search in Google Scholar

Chen, Qing, Shenghong Dong & Pengming Wang. 2024. Advanced multimodal sentiment analysis with enhanced contextual fusion and robustness (AMSA-ECFR): Symmetry in feature integration and data alignment. Symmetry 16(7). 934. https://doi.org/10.3390/sym16070934.Search in Google Scholar

Chu, Eric & Deb Roy. 2017. Audio-visual sentiment analysis for learning emotional arcs in movies. arXiv:1712.02896. https://doi.org/10.48550/arXiv.1712.02896.Search in Google Scholar

Cîrneanu, Andrada-Livia, Dan Popescu & Dragoş Iordache. 2023. New trends in emotion recognition using image analysis by neural networks, a systematic review. Sesnsors (Basel) 23(16). 7092. https://doi.org/10.3390/s23167092.Search in Google Scholar

Coulthard, Malcolm, Alison Johnson & David Wright. 2010. An introduction to forensic linguistics: Language in evidence. London: Routledge.Search in Google Scholar

Das, Ringki & Thoudam Doren Singh. 2024. A hybrid fusion-based machine learning framework to improve sentiment prediction of Assamese in low resource setting. Multimedia Tools and Application 83. 22153–22172. https://doi.org/10.1007/s11042-023-15356-3.Search in Google Scholar

Deutsch, Diana. 2013. Psychology of music. Amsterdam: Elsevier.Search in Google Scholar

Eyu, Jer Min, Kok-Lim Alvin Yau, Lei Liu & Yung-Wey Chong. 2025. Reinforcement learning in sentiment analysis: A review and future directions. Artificial Intelligence Review 58. 6. https://doi.org/10.1007/s10462-024-10967-0.Search in Google Scholar

Fonneland, Trude. 2020. Religion-making in the Disney feature film, Frozen II: Indigenous religion and dynamics of agency. Religions 11(9). 430. https://doi.org/10.3390/rel11090430. https://doi.org.Search in Google Scholar

Fujiia, Yuna, Junjie Shan, Yihong Han & Yoko Nishihara. 2023. Sentiment analysis of user reviews transition in multimedia franchise. Procedia Computer Science 225. 1533–1541. https://doi.org/10.1016/j.procs.2023.10.142.Search in Google Scholar

Gorbman, Claudia. 1987. Unheard melodies: Narrative film music. Bloomington: Indiana University Press.Search in Google Scholar

Gore, Santosh, Yogita Bhapkar, Jayashri Ghadge & Sujata Gore. 2023b. Evolutionary programming for dynamic resource management and energy optimization in cloud computing. In Proceedings of the 2023 international Conference on advanced computing Technologies and applications (ICACTA), 1–5. Mumbai, India.10.1109/ICACTA58201.2023.10393769Search in Google Scholar

Gore, Santosh, Anuradha Deshpande, Nitin Mahankale & Sandip Singha. 2023a. A machine learning-based detection of IoT cyberattacks in smart city applications. In Proceedings of the international Conference on ICT for sustainable development, vol. 2, 73–81. Cham: Springer.10.1007/978-981-99-6568-7_8Search in Google Scholar

Gore, Santosh, Mayur Eknath Ingale, Sujata Gore & Umesh Nanavare. 2023c. Leveraging BERT for next-generation spoken language understanding with joint intent classification and slot filling. In Proceedings of the 2023 international Conference on advanced computing Technologies and applications (ICACTA), 1–5. Mumbai, India.10.1109/ICACTA58201.2023.10393437Search in Google Scholar

Gore, Santosh, Punit Kumar Mishra & Sujata Gore. 2023d. Improvisation of food delivery business by leveraging ensemble learning with various algorithms. In Proceedings of the 2023 international Conference on self sustainable artificial Intelligence systems (ICSSAS), 221–229. Erode, India.10.1109/ICSSAS57918.2023.10331669Search in Google Scholar

Gupta Maurya, Chandra & Sundhanshu Kumar Jha. 2024. Sentiment analysis: A hybrid approach on twitter data. Procedia Computer Science 235(2024). 990–999. https://doi.org/10.1016/j.procs.2024.04.094.Search in Google Scholar

Hoang, Michael, Oskar Alija Bihorac & Jacobo Rouces. 2019. Aspect-based sentiment analysis using BERT. In Proceedings of the 22nd nordic Conference on computational linguistics, 187–196. Turku, Finland: Linköping University Electronic Press. https://aclanthology.org/W19-6120/.Search in Google Scholar

Hu, Xiao & J. Stephen Downie. 2010. Improving mood classification in music digital libraries by combining lyrics and audio. In Proceedings of the 10th annual joint Conference on digital libraries, 159–168. Gold Coast Queensland Australia.10.1145/1816123.1816146Search in Google Scholar

Hu, Xiao, J. Stephen Downie, Cyril Laurier, Mert Bay & Andreas F. Ehmann. 2008. The 2007 MIREX audio mood classification task: Lessons learned. In Proceedings of the 9th international Conference on music information retrieval, 462–467. Philadelphia, PA, United States.Search in Google Scholar

Hu, Xiao, J. Stephen Downie & Andreas F. Ehmann. 2009. Lyric text mining in music mood classification. In Proceedings of the10th international Society for music information retrieval conference (ISMIR 2009), 411–416. Kobe, Japan.Search in Google Scholar

Huang, Changqin, Junling Zhang, Xuemei Wu, Yi Wang, Ming Li & Xiaodi Huang. 2023. TeFNA: Text-centered fusion network with crossmodal attention for multimodal sentiment analysis. Knowledge-Based Systems 269. 110502. https://doi.org/10.1016/j.knosys.2023.110502.Search in Google Scholar

Juslin, Patrik N. & Petri Laukka. 2003. Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin 129(5). 770–814. https://doi.org/10.1037/0033-2909.129.5.770.Search in Google Scholar

Khatua, Aparup, Apalak Khatua & Erik Cambria. 2020. Predicting political sentiments of voters from Twitter in multi-party contexts. Applied Soft Computing 97(Part A). 106743. https://doi.org/10.1016/j.asoc.2020.106743.Search in Google Scholar

Kim, Youngmoo E., Erik M. Schmidt, Raymond Migneco, Brandon G. Morton Patrick Richardson, Jeffrey Scott, Jacquelin A. Speck & Douglas Turnbull. 2010. Music emotion recognition: A state of the art review. In Proceedings of 11th international Society for music information retrieval conference (ISMIR), 255–266. Utrecht, Netherlands.Search in Google Scholar

Koelsch, Stefan. 2015. Music-evoked emotions: Principles, brain correlates, and implications for therapy. Annals of the New York Academy of Sciences 1337(1). 193–201. https://doi.org/10.1111/nyas.12684.Search in Google Scholar

Kvidal-Røvik, Trine & Ashley Cordes. 2022. Into the unknown [Amas Mu Vuordá]? Listening to indigenous voices on the meanings of Disney’s Frozen 2 [Jikŋon 2]. Journal of International and Intercultural Communication 15(1). 17–35. https://doi.org/10.1080/17513057.2020.1849774.Search in Google Scholar

Lai, Songning, Xifeng Hu, Haoxuan Xu, Zhaoxia Ren, Zhi Liu. 2023, Multimodal sentiment analysis: A survey. Available on arXiv: 2305.07611. 80. 102563, https://doi.org/10.1016/j.displa.2023.102563.Search in Google Scholar

Lartillot, Olivier & Petri Toiviainen. 2007. MIR in matlab (II): A toolbox for musical feature extraction from audio. In Proceedings of the 8th international Conference on music information retrieval (ISMIR), 23–27. Bordeaux, France.Search in Google Scholar

Li, Zhen, Bing Xu, Conghui Zhu and Tiejun Zhao. 2022. CLMLF: A contrastive learning and multilayer fusion method for multimodal sentiment detection findings of the association for computational linguistics: NAACL 2022. pp. 2282–2294. Seattle, United States: Association for Computational Linguistics.10.18653/v1/2022.findings-naacl.175Search in Google Scholar

Liu, Bing. 2012. Sentiment analysis and opinion mining. San Rafael: Morgan & Claypool Publishers.Search in Google Scholar

Liu, Jun, Zhihao Wang, Guangrong Wan & Jianbo Liu. 2024. A Novel multi-modal sentiment analysis based on multiple Kernel learning with margin-dimension constraint. International Journal of Computational Intelligence Systems 17(207). 1–13. https://doi.org/10.1007/s44196-024-00624-3.Search in Google Scholar

McFee, Brian, Eric J. Humphrey & Juan P. Bello. 2012. A software framework for musical data augmentation. In Proceedings of the 13th international Society for music information retrieval conference, 493–498. https://brianmcfee.net/papers/ismir2015_augmentation.pdf.Search in Google Scholar

Mullen, Tony & Nigel Collier. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 Conference on empirical Methods in Natural Language Processing (EMNLP), 412–418 Available at: https://aclanthology.org/W04-3253/.Search in Google Scholar

Oatley, Keith & P. N. Johnson-Laird. 1987. Towards a cognitive theory of emotions. Cognition and Emotion 1(1). 29–50. https://doi.org/10.1080/02699938708408362.Search in Google Scholar

Oramas, Sergio, Luis Espinosa-Anke, Aonghus Lawlor, Xavier Serra & Horacio Saggion. 2016. Exploring customer reviews for music genre classification and evolutionary studies. In Proceedings of the 17th international Society for music information retrieval conference (ISMIR), 759–762. New York, USA. https://hdl.handle.net/10230/33063.Search in Google Scholar

Pang, Bo & Lillian Lee. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual Meeting of the Association for computational linguistics, 271–278. Barcelona, Spain.10.3115/1218955.1218990Search in Google Scholar

Pang, Bo & Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1–2). 1–135. https://doi.org/10.1561/1500000011.Search in Google Scholar

Rudloff, Maja. 2016. (Post)feminist paradoxes: The sensibilities of gender representation in Disney’s Frozen. Outskirts 35. 1–20. http://www.outskirts.arts.uwa.edu.au/__data/assets/pdf_file/0009/2950902/Outskirts-Rudloff.pdf.Search in Google Scholar

Russell, James A. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39(6). 1161–1178. https://doi.org/10.1037/h0077714.Search in Google Scholar

Samek, Wojciech, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin & Klaus-Robert Müller. 2017. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems 28(11). 2660–2673. https://doi.org/10.1109/TNNLS.2016.2599820.Search in Google Scholar

Sánchez-Rada, J. Fernando & Carlos A. Iglesias. 2019. Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison. Information Fusion 52. 344–356. https://doi.org/10.1016/j.inffus.2019.05.003.Search in Google Scholar

Sharmin, Sadia & Danial Chakma. 2021. Attention-based convolutional neural network for Bangla sentiment analysis. AI & Society 36. 381–396. https://doi.org/10.1007/s00146-020-01011-0.Search in Google Scholar

Stypinska, Justyna. 2023. AI ageism: A critical roadmap for studying age discrimination and exclusion in digitalized societies. AI & Society 38. 665–677. https://doi.org/10.1007/s00146-022-01553-5.Search in Google Scholar

Tagg, Phillip. 2012. Music’s meanings: A modern musicology for non-musos, 691. New York & Huddersfield: Mass Media Music Scholars’ Press.Search in Google Scholar

Taruffi, Liila, Rory Allen, John Downing & Pamela Heaton. 2017. Individual differences in music-perceived emotions: The influence of External Oriented Thinking. Music Perception: An Interdisciplinary Journal 34(3). 253–266. https://doi.org/10.1525/mp.2017.34.3.253.Search in Google Scholar

Vicari, Mattia & Mauro Gaspari. 2021. Analysis of news sentiments using natural language processing and deep learning. AI & Society 36. 931–937. https://doi.org/10.1007/s00146-020-01111-x.Search in Google Scholar

Yang, Jing. 2021. A novel music emotion recognition model using neural network Technology. Frontiers in Psychology 12. 760060. https://doi.org/10.3389/fpsyg.2021.760060.Search in Google Scholar

Yang, Yi-Hsuan & Homer H. Chen. 2011. Music emotion recognition. New Yok & London: Routledge.10.1201/b10731Search in Google Scholar

Yang, Liang, Zhexu Shen, Jingjie Zeng, Xi Luo & Hongfei Lin. 2024. COSMIC: Music emotion recognition combining structure analysis and modal interaction. Multimedia Tools and Applications 83. 12519–12534. https://doi.org/10.1007/s11042-023-15376-z.Search in Google Scholar

Zentner, Marcel, Didier Grandjean & Klaus R. Scherer. 2008. Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion 8(4). 494–521. https://doi.org/10.1037/1528-3542.8.4.494.Search in Google Scholar

Zhang, Zhe, Zhu Wang, Xiaona Li, Nannan Liu, Bin Guo & Zhiwen Yu. 2021. ModalNet: An aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web 24(1). 1957–1974. https://doi.org/10.1007/s11280-021-00955-7.Search in Google Scholar

Zhao, Xianbing, Yinxin Chen, Sicen Liu & Buzhou Tang. 2023. Shared-private memory networks for multimodal sentiment analysis. IEEE Trans. Affective Computing 14(4). 2889–2900. https://doi.org/10.1109/TAFFC.2022.3222023.Search in Google Scholar

Received: 2025-04-09
Accepted: 2025-04-26
Published Online: 2025-08-21

© 2025 the author(s), published by De Gruyter on behalf of Soochow University

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 1.5.2026 from https://www.degruyterbrill.com/document/doi/10.1515/lass-2025-0032/html?lang=en
Scroll to top button