Windows Speech Recognition

Windows Speech Recognition (WSR) is speech recognition developed by Microsoft for Windows Vista that enables voice commands to control the desktop user interface, dictate text in electronic documents and email, navigate websites, perform keyboard shortcuts, and operate the mouse cursor.

It provides a personal dictionary that allows users to include or exclude words or expressions from dictation and to record pronunciations to increase recognition accuracy.

[7] At WinHEC 2002 Microsoft announced that Windows Vista (codenamed "Longhorn") would include advances in speech recognition and in features such as microphone array support[8] as part of an effort to "provide a consistent quality audio infrastructure for natural (continuous) speech recognition and (discrete) command and control.

[15][16] Microsoft later emphasized accessibility, new mobility scenarios, support for additional languages, and improvements to the speech user experience at WinHEC 2005.

[19] To incentivize company employees to analyze WSR for software glitches and to provide feedback, Microsoft offered an opportunity for its testers to win a Premium model of the Xbox 360.

[20] During a demonstration by Microsoft on July 27, 2006—before Windows Vista's release to manufacturing (RTM)—a notable incident involving WSR occurred that resulted in an unintended output of "Dear aunt, let's set so double the killer delete select all" when several attempts to dictate led to consecutive output errors;[21][22] the incident was a subject of significant derision among analysts and journalists in the audience,[23][24] despite another demonstration for application management and navigation being successful.

[28] Microsoft stated that although such an attack is theoretically possible, a number of mitigating factors and prerequisites would limit its effectiveness or prevent it altogether: a target would need the recognizer to be active and configured to properly interpret such commands; microphones and speakers would both need to be enabled and at sufficient volume levels; and an attack would require the computer to perform visible operations and produce audible feedback without users noticing.

[29] WSR was updated to use Microsoft UI Automation and its engine now uses the WASAPI audio stack, substantially enhancing its performance and enabling support for echo cancellation, respectively.

Sleep mode has also seen performance improvements and, to address security issues, the recognizer is turned off by default after users speak "stop listening" instead of being suspended.

Windows 7 also introduces an option to submit speech training data to Microsoft to improve future recognizer versions.

[32][33] WSR is featured in the Settings application starting with the Windows 10 April 2018 Update (Version 1803); the change first appeared in Insider Preview Build 17083.

[36][37] In December 2023 Microsoft announced that WSR is deprecated in favor of Voice Access and may be removed in a future build or release of Windows.

[41][43] Custom language models for the specific contexts, phonetics, and terminologies of users in particular occupational fields such as legal or medical are also supported.

[44] With Windows Search,[45] the recognizer also can optionally harvest text in documents, email, as well as handwritten tablet PC input to contextualize and disambiguate terms to improve accuracy; no information is sent to Microsoft.

An ExactMatchOverPartialMatch entry in the Windows Registry can limit commands to items with exact names if there is more than one instance included in results.

WSR supports custom macros through a supplementary application by Microsoft that enables additional natural language commands.

[56] Microsoft has also released sample macros for the speech dictionary,[57] for Windows Media Player,[58] for Microsoft PowerPoint,[59] for speech synthesis,[60] to switch between multiple microphones,[61] to customize various aspects of audio device configuration such as volume levels,[62] and for general natural language queries such as "What is the weather forecast?

Users and developers can create their own macros based on text transcription and substitution; application execution (with support for command-line arguments); keyboard shortcuts; emulation of existing voice commands; or a combination of these items.