


Meta claims that their framework has attained a 75% more accurate understanding of transcriptions than even the very best audiovisual frameworks that are currently being used, and what’s more is that according to Meta’s claims they only needed 10% of the data to get these superior results. With all of that having been said and now out of the way, it is important to note that the results that have come in for AV-HuBERT seem to be rather positive with all things having been considered and taken into account. Monitoring the movement of lips could add another form of input that may very well boost the ability of AI to understand human beings and to contextualize their words thereby enabling said AI to perform tasks in a much more efficient manner after it has been fully trained. Previously, voice and speech recognition software has operated on an audio only basis. What Meta is basically trying to do is to see if anything can be gained by allowing AI to read lips as well as listen to audio recordings and the like. Meta has developed a new framework called AV-HuBERT that will take both factors into account because of the fact that this is the sort of thing that could potentially end up vastly improving its speech recognition potential, although it should be said that this is only a test at this point. Reading someone’s lips can also be a crucial aspect of this since it can help you parse the meaning of their words in situations where you might not be able to hear them all that clearly, and that is something that Meta seems to be taking into account when it comes to their AI.Ī lot of studies have revealed that it would be a lot more difficult to understand whatever it is that someone is trying to say if you can’t see the manner in which their mouth is moving.

The main technique that is used during face to face communication is speech, but this involves a lot more than just listening to the words that people say.
