DemoAI/ML
Multimodal Emotion Recognition
Cross-modal attention model combining HuBERT audio and EfficientNet visual encoders with bidirectional fusion across 8 emotion classes.
PyTorchHuBERTEfficientNetMultimodal AI
The Problem
Single-modality emotion detection misses context. Audio tone and facial expressions together reveal more than either alone.
Architecture & Approach
Cross-modal attention model fusing HuBERT audio and EfficientNet visual encoders with bidirectional attention and learnable modality weights.
Results
8-class emotion classification with deployed demo on HuggingFace.
Interactive Demo
Loading demo (free tier may take 30s to wake up)...