AFAR

Deep Learning for Facial AU, Depression and Personality Analysis

Project Aim:

The aim of the Deep Learning for Facial AU, Depression and Personality Analysis Project is to:

propose an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display;
open up a new avenue of research for predicting and recognizing socio-emotional phenomena (personality, affect, engagement etc.) from simulations of person-specific cognitive processes.
propose to represent the target subjects (defined as the listener) person-specific cognition in the form of a person-specific CNN architecture that has unique architectural parameters and depth, which takes audio-visual non-verbal cues displayed by the conversational partner (defined as the speaker) as input, and is able to reproduce the target subject's facial reactions.
propose a two-stage framework that models depression severity from multi-scale short-term and video-level facial behaviours.
present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks.

Studies Conducted:

A summary of studies with human participants (as of June 2023):

Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition [IJCAI 2022]
Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation [ACM MM 2021]
Learning Person-specific Cognition from Facial Reactions for Automatic Personality Recognition (IEEE TAFFC) [arXiv 2021]
Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation [arXiv 2021]
An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition [arXiv 2022]

Major Findings:

Major findings (as of June 2023):

- Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results
- The experimental results show that the CNN architectures are well associated with target subjects' personality traits and the proposed approach clearly outperforms multiple existing approaches that predict personality directly from non-verbal behaviours
- Experimental results not only show that the produced graph representations are well associated with target subjects' personality traits in both human-human and human-machine interaction scenarios, and outperform the existing approaches with significant advantages, but also demonstrate that the proposed novel strategies such as adaptive loss, and the end-to-end vertices/edges feature learning, help the proposed approach in learning more reliable personality representations
- The experimental results on AVEC 2013 and AVEC 2014 datasets show that the proposed DFE module constantly enhanced the depression severity estimation performance for various CNN models while the SPG is superior than other video-level modelling methods
- The experimental results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; and (iii) non-verbal behaviours contribute differently in predicting different personality traits

Project Team:

- Prof Hatice Gunes (PI, Apr 2019-present)
- Dr Siyang Song (Postdoctoral RA, Aug 2021 – Jan 2023) – now a Lecturer at University of Leicester, UK
- Cheng Luo (Intern)