Audio-visual interaction (2)
- speech-driven face animation,
- joint audio-video coding
- bimodal person authentication.
Automated lip reading systems are based on audio and visual speech recognizers. Visual speech recognizers are based on HMMs or time delayed neural networks (TDNNs). They employ:
- features of binary mouth images,such as height, width, perimeter,
along with their derivatives, or
- active shape models, or
- the aforementioned geometric parameters combined with the
wavelet transform of the mouth images.