Speech is a highly complicated random signal, yet as simple as a deterministic periodic signal. We cannot imagine a day without having a speech conversation with anyone else. This signal conveys a lot of meaning and understanding - it is considered as one of the most significant modes of information transfer.
Speech is very much omnipresent, that we wish it can be used to build speech recognition systems and make the systems (or machines) around us also understand speech. Speech recognition consists of a receptor (or input device/ similar to the ear) and a processor (where the speech recognition system is built/ similar to the brain).
The speech signal processing here is a two fold problem - one which is its generation and the second its processing. Thus, the speech recognition problem is the study of the two sides of the same coin - synthesis/analysis. The synthesis step enables us to determine the features that are useful in the analysis. Common feature of speech are the formant frequencies, pitch, vowel/consonant etc.
An isolated word recognition system is a very realistic problem to start with. Suppose we are using a vehicle that uses an Automatic Speech Recognition (ASR) system, then by the utterance of single words the driver will be able to perform tasks like turning on the head lights, increasing the volume of the stereo, or turning the windshield wipers when it is raining. The most common words in this scenario will be "Lights on", "Lights off", "Mute Stereo", "plus Volume" etc.
It has been reported that an ASR with ten words in the dictionary can work with an accuracy of 0.5%. This suggests that it is possible to treat the systems around us also as human - yes they can understand speech too.
Speech is very much omnipresent, that we wish it can be used to build speech recognition systems and make the systems (or machines) around us also understand speech. Speech recognition consists of a receptor (or input device/ similar to the ear) and a processor (where the speech recognition system is built/ similar to the brain).
The speech signal processing here is a two fold problem - one which is its generation and the second its processing. Thus, the speech recognition problem is the study of the two sides of the same coin - synthesis/analysis. The synthesis step enables us to determine the features that are useful in the analysis. Common feature of speech are the formant frequencies, pitch, vowel/consonant etc.
An isolated word recognition system is a very realistic problem to start with. Suppose we are using a vehicle that uses an Automatic Speech Recognition (ASR) system, then by the utterance of single words the driver will be able to perform tasks like turning on the head lights, increasing the volume of the stereo, or turning the windshield wipers when it is raining. The most common words in this scenario will be "Lights on", "Lights off", "Mute Stereo", "plus Volume" etc.
It has been reported that an ASR with ten words in the dictionary can work with an accuracy of 0.5%. This suggests that it is possible to treat the systems around us also as human - yes they can understand speech too.

No comments:
Post a Comment