On Text Datasets For Stress Detection: A Comprehensive Analysis And Future Ideas
Main Article Content
Abstract
Human health depends on the early identification of psychological stress. Stress can be detected by analyzing the text written by humans. Currently, a limited number of text datasets exist for stress detection. This article gives a detailed analysis of the base papers of ten text datasets used for emotion and stress detection and will be helpful to researchers in selecting the best datasets for their research work. Datasets chosen for this analysis are Dreaddit, TensiStrength, SMHD, VENT, GoEmotion, ISEAR, EmoInt, EmoBank, and TEC. Data annotation is an important task in the preparation of a text dataset. In this article four techniques for getting an inter-rater agreement are discussed, namely Pearson Correlation Coefficient, Spearman Correlation Coefficient, Krippendorff’s alpha coefficient, and Fleiss’ Kappa. After doing a detailed analysis of the existing datasets, the Dreaddit dataset seems to be the best option for research work on stress detection. Also, the Pearson correlation Coefficient technique provides the best results for getting an inter-rater agreement. This article concludes with a discussion of several unresolved challenges and potential research directions for text-based identification of stress.