Modelovanje moralnih i emocionalnih aspekata jezika u klasifikaciji konverzacionih tekstova

Show simple item record

dc.contributor.advisor	Graovac, Jelena
dc.contributor.author	Šošić, Milena
dc.date.accessioned	2025-10-10T14:36:06Z
dc.date.available	2025-10-10T14:36:06Z
dc.date.issued	2025-10
dc.identifier.uri	http://hdl.handle.net/123456789/5774
dc.description.abstract	Conversational text messages represent an important form of digital communication in modern society. With the development of information technologies, various communication tools have emerged, such as email, social media, instant messaging tools, and automated response systems. Messages generated within these tools, unlike standard texts, have a specific structure that allows for the classification of individual messages or sets of messages that form a conversation. Classification labels are defined by the specific task being addressed and can be either single-label or multi-label, which enables the recognition of complex interrelationships between the categories. Introducing moral and emotional dimensions of language into research is crucial for understanding the complex patterns of human communication, particularly in the context of digital platforms and social media. Machine learning (ML) methods, such as deep neural networks (DNN), facilitate the utilization and more precise recognition of these aspects while simultaneously providing an efficient way to classify emotions and moral values expressed in texts. The noticeable complexity in the expression of human emotions and moral values, which are often conveyed implicitly and depend heavily on context, makes their recognition particularly challenging. One of the major challenges is the lack of or limited availability of resources in terms of size and diversity for low-resource languages, including Serbian. The development of linguistic resources, such as annotated lexicons and corpora, plays a crucial role in this process by providing the necessary knowledge sources for building and improving existing ML models. Linguistic resources enable models to learn how different emotional expressions and moral values influence the tone and meaning of communication. To support this, a semantic lexicon for sentiment intensity, SentiWords.SR, containing approximately 15k words, was developed for the Serbian language, along with the associated tool SRPOL for measuring sentiment intensity in textual sequences in Serbian. Additionally, a semantic lexicon for emotional affect, EmoLex.SR, comprising around 9.8k words with assigned emotional intensity values, and a semantic lexicon for moral values, MFD.SR, consisting of approximately 4.3k words with associated moral value weights, were developed. Significant efforts were also made in annotating the first conversational corpora from social media with emotional and moral categories. In this regard, the Social-Emo.SR corpus (∼34.6k messages) was developed, consisting of the Twitter-Emo.SR subcorpus (∼16.7k messages) and the Reddit-Emo.SR subcorpus (∼17.9k messages), collected from Twitter and Reddit, respectively. Furthermore, by searching for key moral-related terms, a subset of messages expressing potential moral stances was extracted from Social-Emo.SR. This subset, named Social-Mor.SR (∼13.6k messages), was manually verified and annotated by human annotators and consists of the Twitter-Mor.SR subcorpus (∼6.1k Twitter messages) and the Reddit-Mor.SR subcorpus (∼7.5k Reddit messages). In the context of DNN architectures, models based on recurrent networks or transformers, trained on these resources, enable the recognition and utilization of emotional and moral aspects of language in various contexts. The combination of advanced algorithms, such as Bidirectional Long Short-Term Memory (BiLSTM) networks and the attention mechanism with linguistically and culturally adapted resources (Meta) opens new possibilities for analyzing moral and emotional aspects of language. This has broad applications in classification tasks such as recognizing personal context, truthfulness of posts, or types of engagement in digital communication. For personal context recognition, i.e. classifying corporate emails as either business-related or personal, results show that using a carefully designed hybrid approach (BiLSTM-Att+Meta) across entire conversation branches yields the best results, comparable to published benchmarks on the same task. In experiments related to rumor veracity classification and identifying engagement types in response to rumors, it was demonstrated that moral and emotional attributes derived from semantic lexicons (EmoAttr, MorAttr ⊆ Meta) improve classification accuracy by +4.2% and +3.8% respectively, compared to methods without these attributes. For emotion recognition in Serbian conversational texts, experiments revealed that transformer-based models fine-tuned on the task achieved F1-scores of approximately 53%, reaching performance levels reported for multi-label classification on the same emotional category set. Additionally, experiments showed that further data preprocessing and balancing improved model performance. In moral value and moral sentiment classification tasks, using the Social-Mor.SR corpus and its subcorpora, an F1-score of ∼46% was achieved for moral value recognition and ∼38% for moral sentiment recognition, indicating acceptable results but also the need for further model optimization. Fine-tuning LLaMA models yielded reasonable but slightly lower performance compared to BERT-based architectures. Since model performance is directly dependent on the data they are trained on, there is potential for further improvements by refining and balancing initial annotations in the utilized corpora.	en_US
dc.description.provenance	Submitted by Slavisha Milisavljevic (slavisha) on 2025-10-10T14:36:06Z No. of bitstreams: 1 Doktorski_rad_Milena_Sosic.pdf: 6206459 bytes, checksum: da16ee1cd37d82e8034bde484d0039bf (MD5)	en
dc.description.provenance	Made available in DSpace on 2025-10-10T14:36:06Z (GMT). No. of bitstreams: 1 Doktorski_rad_Milena_Sosic.pdf: 6206459 bytes, checksum: da16ee1cd37d82e8034bde484d0039bf (MD5) Previous issue date: 2025-10	en
dc.language.iso	sr	en_US
dc.publisher	Beograd	en_US
dc.title	Modelovanje moralnih i emocionalnih aspekata jezika u klasifikaciji konverzacionih tekstova	en_US
mf.author.birth-date	1978-06-07
mf.author.birth-place	Požarevac	en_US
mf.author.birth-country	Srbija	en_US
mf.author.residence-state	Srbija	en_US
mf.author.citizenship	Srpsko	en_US
mf.author.nationality	Srpkinja	en_US
mf.subject.area	Computer science	en_US
mf.subject.keywords	Morality, emotions, conversational texts, classification	en_US
mf.subject.subarea	Natural language processing	en_US
mf.contributor.committee	Mitić, Nenad
mf.contributor.committee	Nikolić, Mladen
mf.university.faculty	Mathematical Faculty	en_US
mf.document.references	227	en_US
mf.document.pages	209	en_US
mf.document.location	Beograd	en_US
mf.document.genealogy-project	No	en_US
mf.university	Belgrade University	en_US

Files in this item

Files	Size	Format	View
Doktorski_rad_Milena_Sosic.pdf	6.206Mb	PDF	View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Modelovanje moralnih i emocionalnih aspekata jezika u klasifikaciji konverzacionih tekstova

eBiblioteka

Modelovanje moralnih i emocionalnih aspekata jezika u klasifikaciji konverzacionih tekstova

Files in this item

This item appears in the following Collection(s)

Pretraga eBiblioteke

Listanje

Kompletne eBiblioteke

Ove kolekcije

Moj nalog

Relited sites

COPYRIGHT STATEMENT