In general, all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages.
The recognition and synthesis engines also generate events while processing (for example, to indicate an utterance has been recognized or to indicate word boundaries in the synthesized speech).
These pass in the reverse direction, from the engines, through the runtime DLL, and on to an event sink in the application.
In addition to the actual API definition and runtime DLL, other components are shipped with all versions of SAPI to make a complete Speech Software Development Kit.
The following components are among those included in most versions of the Speech SDK: Xuedong Huang was a key person who led Microsoft's early SAPI efforts.
This version of SAPI included both the core COM API; together with C++ wrapper classes to make programming from C++ easier; and ActiveX controls to allow drag-and-drop Visual Basic development.
The design of the new API included the concept of strictly separating the application and engine so all calls were routed through the runtime sapi.dll.
This change was intended to make the API more 'engine-independent', preventing applications from inadvertently depending on features of a specific engine.
In addition, this change was aimed at making it much easier to incorporate speech technology into an application by moving some management and initialization code into the runtime.
The recognition engines supported continuous dictation and command & control and were released in U.S. English, Japanese and Simplified Chinese versions.
Automation-compliant interfaces were added to the API to allow use from Visual Basic, scripting languages such as JScript, and managed code.
It added support for SRGS and SSML mark-up languages, as well as additional server features and performance improvements.
Microsoft Sam (Speech Articulation Module[citation needed]) is a commonly shipped SAPI 5 voice.
This works well in some scenarios however the new API should provide a more seamless experience equivalent to using any other managed code library.