Sawita Yousukkee. Spam filtering algorithm for live chat. Doctoral Degree(Information Technology (International Program)). . : King Mongkut's University of Technology North Bangkok, 2020.
Abstract:
This research focused on the study of users behavior and chat message characteristics in the live chat of YouTube live streaming. We analyzed YouTube live streaming comments to understand spammers behavior. Seven users behavior features and message characteristic features were comprehensively analyzed. In particular, we explored whether different potential markers in user comments, such as word count per chat message, relevant score using bag-of-words, similarity between comments, polarity of the comments, number of chat messages, interarrival time of chat messages, and time duration that a user spent in a live chat be used to effectively differentiate spammers from normal users and proposes a development framework for YouTube live chat detection. According to our findings, features that performed best in terms of run time and classification efficiency is the relevant score (M2) together with the time spent in live chat (B2) and the similarity score (M3).
Moreover, it was found that decision tree classification techniques is suitable for spam filtering for live chat because it has the highest accuracy and the smallest processing time which prevents the delay from queuing to done the classification. It can lead to the creation of a simulator that can work efficiently with the accuracy of 98% and take less processing time which suitable for working on real-time systems.