Abstract:
Emotions are one of the most fundamental human expressions, which can be expressed in a variety
of ways, such as by voice, face, or gestures. In the development of systems that interact with humans,
it is extremely important to recognize how human emotions react to the system. This article presents the
design and development of the YOLOv4-tiny and YOLOv5s deep learning models to analyze emotions
from human faces. The model runs on a low-cost embedded device, the Jetson Nano, equipped with a
built-in camera. The motion picture received from the camera is identified by detecting and framing faces,
then displayed the emotional analysis of the faces in real-time. The model can categorize 7 emotions:
anger, disgust, fear, happiness, sadness, surprise, and neutral. The RAF-DB image dataset was used to
train and test the models. Upon the evaluation, we found that the YOLOv5s model performed better
than the YOLOv4-tiny model in terms of accuracy, with an F1 score of 0.806 compared to 0.774 for
the YOLOv4-tiny model. In terms of processing speed, the YOLOv5 model can display video at roughly
11 FPS, while the YOLOv4-tiny can display video at approximately 10.5 FPS."