Skip to content
Back to Projects
DemoAI/ML

Multimodal Emotion Recognition

Cross-modal attention model combining HuBERT audio and EfficientNet visual encoders with bidirectional fusion across 8 emotion classes.

PyTorchHuBERTEfficientNetMultimodal AI

The Problem

Single-modality emotion detection misses context. Audio tone and facial expressions together reveal more than either alone.

Architecture & Approach

Cross-modal attention model fusing HuBERT audio and EfficientNet visual encoders with bidirectional attention and learnable modality weights.

Results

8-class emotion classification with deployed demo on HuggingFace.

Interactive Demo

Loading demo (free tier may take 30s to wake up)...