Challenges in audio-visual speech recognition area

Open questions:
- Which facial features are important for lipreading?
- How do geometric features (lip protrusion, mouth height & width) compare to non-geometric features (e.g. DCT) of the mouth image?
- What methods are most effective for fusing audio and visual information? No good comparison of fusion techniques has been done yet.
- How much can lipreading change human-computer interaction to improve the effectiveness of the user while performing a certain task?
- How much can visual speech recognition improve audio-only speech recognition robustness and user friendliness?

Aristotle University of Thessaloniki

Previous slide Next slide Back to first slide View graphic version