Speech recognition (in many contexts, also known as automatic speech recognition, computer speech recognition or voice recognition) is the process of converting a speech signal to a set of words, by means of an algorithm implemented as a computer program. Speech recognition applications that have emerged over the last years include voice dialing (e.g., Call home), call routing (e.g., I would like to make a collect call), simple data entry (e.g., entering a credit card number), and preparation of structured documents (e.g., a radiology report).
Speech recognition systems can be characterized by many parameters as in the table below.
Parameters
Range
Speaking Mode
Isolated words to continuous speech
Speaking Style
Read speech to spontaneous speech
Enrollment
Speaker-dependent to Speaker-independent
Vocabulary
Small (< 20 words) to large (> 20,000 words)
Language Model
Finite-state to context-sensitive
Perplexity
Small (< 10) to large (> 100)
SNR
High (> 30 dB) to low (< 10 dB)
Transducer
Voice-cancelling microphone to telephone
An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies and is much more dificult to recognize than speech read from script. Some systems require speaker enrollment (a user must provide samples of his or her speech before using them) whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words. The simplest language model can be specified as a finite-state network, where the permissible words following each word are explicitly given. More general language models approximating natural language are specified in terms of a context-sensitive grammar.
One popular measure of the difficulty of the task, combining the vocabulary size and the language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied. In addition, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.
More on [ Speech recognition ]
Google Unveils Chrome Source Code and Linux port Sun, 07 Sep 2008 00:01:02 -0000 ars Technica: "In conjunction with the release, Google has also launched Chromium, an open-source software project that enables third-party developers to study, modify, extend, and redistribute the underlying source code of the Chrome browser." Selling GNU/Linux in a Box Sat, 06 Sep 2008 20:01:39 -0000 Linux.com: "Eight years ago, computer stores stocked a choice of GNU/Linux distributions -- established ones like Caldera, Red Hat, and SUSE, and newcomers like Corel, Progeny, and Stormix. Now, only Ubuntu and openSUSE offer box sets, and both face challenges that other distributions found unsolvable..." 10 Open Source Companies to Watch Sat, 06 Sep 2008 19:01:39 -0000 Network World: "The decision is no longer a question of open source, but about what product is best at solving computing problems regardless of how it was built." Centralized Access With iSCSI Wraps it up: Open Source SANs, part 4 Sat, 06 Sep 2008 18:01:39 -0000 Search Enterprise Linux: "If you've read the three previous parts of this tip, you should now have two servers running and a Distributed Replicated Block Device (DRBD) available between them. The iSCSI target service will draw the two servers and DRBD together to create a fully functional SAN." Adding Heartbeat to Your Open Source SAN: Open Source SANs, part 3 Sat, 06 Sep 2008 17:01:39 -0000 Search Enterprise Linux: "Heartbeat is a monitoring tool that will help you to make the most of your SAN by catching problems before they interfere with your productivity. Part three of this four-part tip shows you how to install a Heartbeat cluster in an open source SAN." Setting up DRBD in an Open Source SAN: Open Source SANs, part 2 Sat, 06 Sep 2008 16:01:39 -0000 Search Enterprise Linux: "As we established in part one of this series on open storage area networks (SANs), building an open source SAN provides a cost-effective alternative for companies with a tight budget. Now that we've established the merits and some of the important considerations in creating open source SANs, we'll explain how to set up the Distributed Replicated Block Device (DRBD) service, which allows for replicated storage in a SAN."
CVoiceControl - CVoiceControl is a speech recognition system that allows the user to connect spoken commands to Unix commands.
FreeSpeech - Free Speech Recognition for Linux - Openmind (Freespeech) is a free speech recognition project for Linux It will be designed so that it can be easily integrated into any application or windowmanager as well as the kde and gnome desktop environments
404IBM ViaVoice SDK for Linux - the ViaVoice Kit provide the necessary tools to develop applications that incorporate speech recognition using Linux
Meta Description: [ The page you requested cannot be displayed (HTTP response code 404) ]
The Festival Speech Synthesis System - Festival is a general multi-lingual speech synthesis system developed at CSTR. It offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques.
The MBROLA PROJECT - Multi-lingual text to speech synthesis. Free software download for research purposes.