Fwd: [Accessibility] TTS API document + introduction
Bill Haneman
Bill.Haneman at Sun.COM
Wed Mar 8 10:18:17 PST 2006
On Wed, 2006-03-08 at 18:13, Peter Korn wrote:
...
> > Does this mean we're moving this to March 15?
> I think we should. I believe Willie is at an appointment and may not be
> back in time for our meeting slot today.
>
> I also wonder about whether we should meet on the 15th, or the 22nd.
> The CSUN conference is March 21-25, and at least one of us will be at
> the "What's new in GNOME & Java Accessibility" talk from 10:40 to
> 11:40am PT on the 22nd (that'd be me, who is co-presenting that talk).
> And the week before I know some of us will still be busy in conference
> preparations...
I appreciate that. I am also aware, though, that this would mean the
at-spi discussions related to Mozilla/gecko/Firefox would be delayed for
longer as well, and they will already have been postponed two weeks.
Bill
>
>
> Peter
> >
> > Bill
> >
> >> Olaf Jan Schmidt writes:
> >>
> >>
> >>> Hi!
> >>>
> >>> For those who are not subscribed to accessibility at freedesktop.org I
> >>> am forwarding the latest draft for the joint TTS API that we need
> >>> for reworking kttsd and SpeechDispatcher.
> >>>
> >>> Hynek has written an introduction that summarises our approach. I
> >>> hope it helps our discussion on Wednesday.
> >>>
> >>> Please cc the freedesktop.org list in your comments, because I want
> >>> to make sure that there is at least one place where all the email
> >>> discussion goes.
> >>>
> >>> Olaf
> >>>
> >>> --
> >>> Olaf Jan Schmidt, KDE Accessibility co-maintainer, open standards
> >>> accessibility networker, Protestant theology student and webmaster
> >>> of http://accessibility.kde.org/ and http://www.amen-online.de/
> >>>
> >>
> >> Content-Description: Hynek Hanke <hanke at brailcom.org>:
> >> [Accessibility] TTS API document + introduction
> >>
> >>
> >>> From: Hynek Hanke <hanke at brailcom.org>
> >>> To: "Accessibility, Freedesktop" <accessibility at freedesktop.org>
> >>>
> >>>
> >>> Hello,
> >>>
> >>> here is the latest version of the TTS API document with a new
> >>> introduction section trying to summarize the previous private and
> >>> public discussions on this topic. Comments are welcomed.
> >>>
> >>> With regards,
> >>> Hynek Hanke
> >>>
> >>> Changes
> >>> =======
> >>>
> >>> * Introduction was written (clarification of intent, scope)
> >>>
> >>> * Clarification of the meaning of MUST HAVE, SHOULD HAVE
> >>>
> >>> * Point (4.11) was removed as not directly important for accessibility
> >>> (after discussions with Willie Walker who requested the point)
> >>>
> >>> * Point (4.13) was removed because its purpose is not clear.
> >>> Even if this functionality is needed, the 's' SSML element is
> >>> not a good way to do it.
> >>>
> >>> * Reformulation of (1.4), added 'temporarily' to (3.2), 'software
> >>> synthesizers' in (4.4), terminology in (4.13),
> >>> clarification in (B.1.3/2) and (B.1.3/3)
> >>>
> >>>
> >>> Common TTS Driver Interface
> >>> ============================
> >>> Document version: 2006-03-06
> >>>
> >>> The purpose of this document is to define a common low-level interface
> >>> to access the various speech synthesizers on Free Software and Open
> >>> Source platforms. It is designed to be used by applications that do
> >>> not need the advanced functionality like message management and by
> >>> applications providing high-level interfaces (such as Speech
> >>> Dispatcher, Gnome Speech, KTTSD etc.) The purpose of this document is
> >>> not to define and force an API on the speech synthesizers. The
> >>> synthesizers might use different interfaces that will be handled by
> >>> their drivers.
> >>>
> >>> This interface will be implemented by a simple layer integrating
> >>> available speech synthesis drivers and in some cases emulating some of
> >>> the functionality missing in the synthesizers themselves.
> >>>
> >>> Advanced capabilities not directly related to speech, like message
> >>> management, priorities, synchronization etc. are left out of scope for
> >>> this low-level interface. They will be dealt with by higher-level
> >>> interfaces. (It is desirable to be able to agree on a common
> >>> higher-level interface too, but agreeing first on a low-level
> >>> interface is an easier task to accomplish.) Such high-level interface
> >>> (not necessarily limited to speech) will make good use of the already
> >>> existing low-level interface.
> >>>
> >>> It is desirable that simple applications can use this API in a simple
> >>> way. However, the API must also be complex enough so that it doesn't
> >>> limit more advanced applications in use of the synthesizers.
> >>>
> >>> The first part (A) of this document describes the requirements
> >>> gathered between projects like Gnome Speech, Speech Dispatcher, KTTSD,
> >>> Emacspeak and SpeakUp of what they might reasonably expect from speech
> >>> synthesis on a system. These requirements are not meant to be the
> >>> requirements on the synthesizers, although they might be a guide to
> >>> synthesizer authors as they plan future features and capabilities for
> >>> their products. Parts (B) and (C) describe the XML/SSML markup in use
> >>> and part (D) defines the interface.
> >>>
> >>> Temporary note: The goal of this interface is real implementation in
> >>> foreseeable future. The next step will be merging the available
> >>> engine drivers in the various accessibility projects under this
> >>> interface and using this interface. For this reason, we need all
> >>> accessibility projects who want to participate in this common effort
> >>> to make sure all their requirements on a low-level speech output
> >>> interface are met and that such an interface is defined that it is
> >>> suitable for their needs.
> >>>
> >>> Temporary note: Any comments about this draft are welcome and
> >>> useful. But since the goal of these requirements is real
> >>> implementation, we need to avoid endless discussions and keep the
> >>> comments focused and to the point.
> >>>
> >>> A. Requirements
> >>>
> >>> This section defines a set of requirements on the interface and on
> >>> speech synthesizer drivers that need to support assistive
> >>> technologies on free software platforms.
> >>>
> >>> 1. Design Criteria
> >>>
> >>> The Common TTS Driver Interface requirements will be developed
> >>> within the following broad design criteria:
> >>>
> >>> 1.1. Focus on supporting assistive technologies first. These
> >>> assistive technologies can be written in any programming language
> >>> and may provide specific support for particular environments such
> >>> as KDE or GNOME.
> >>>
> >>> 1.2. Simple and specific requirements win out over complex and
> >>> general requirements.
> >>>
> >>> 1.3. Use existing APIs and specs when possible.
> >>>
> >>>
> >>> 1.4 All language dependent functionality with respect to text
> >>> processing for speech synthesis should be covered in the
> >>> synthesizers or synthesis drivers, not in applications.
> >>>
> >>> 1.5. Requirements will be categorized in the following priority
> >>> order: MUST HAVE, SHOULD HAVE, and NICE TO HAVE.
> >>>
> >>> The priorities have the following meanings with respect
> >>> to the drivers available under this API:
> >>> MUST HAVE: All drivers must satisfy this requirement.
> >>>
> >>> SHOULD HAVE: The driver will be usable without this feature, but
> >>> it is expected the feature is implemented in all drivers
> >>> intended for serious use.
> >>>
> >>> NICE TO HAVE: Optional features.
> >>>
> >>> Regardless of the priority, full interface will be provided
> >>> by the API, even when the given functionality is actually not
> >>> implemented behind the interface.
> >>>
> >>> 1.6. Requirements outside the scope of this document will be
> >>> labelled as OUTSIDE SCOPE.
> >>>
> >>> 1.7. An application must be able to determine if SHOULD HAVE
> >>> and NICE TO HAVE features are supported for a given driver.
> >>>
> >>>
> >>> 2. Synthesizer Discovery Requirements
> >>>
> >>> 2.1. MUST HAVE: An application will be able to discover all speech
> >>> synthesizer drivers available to the machine.
> >>>
> >>> 2.2. MUST HAVE: An application will be able to discover all possible
> >>> voices available for a particular speech synthesizer driver.
> >>>
> >>> 2.3. MUST HAVE: An application will be able to determine the
> >>> supported languages, possibly including also a dialect or a
> >>> country, for each voice available for a particular speech
> >>> synthesizer driver.
> >>>
> >>> Rationale: Knowledge about available voices and languages is
> >>> necessary to select proper driver and to be able to select a
> >>> supported language or different voices in an application.
> >>>
> >>> 2.4. MUST HAVE: Applications may assume their interaction with the
> >>> speech synthesizer driver doesn't affect other operating system
> >>> components in any unexpected way.
> >>>
> >>> 2.5. OUTSIDE SCOPE: Higher level communication interfaces to
> >>> the speech synthesizer drivers. Exact form of the
> >>> communication protocol (text protocol, IPC etc).
> >>>
> >>> Note: It is expected they will be implemented by particular
> >>> projects (Gnome Speech, KTTSD, Speech Dispatcher) as wrappers
> >>> around the low-level communication interface defined below.
> >>>
> >>>
> >>> 3. Synthesizer Configuration Requirements
> >>>
> >>> 3.1. MUST HAVE: An application will be able to specify the default
> >>> voice to use for a particular synthesizer, and will be able to
> >>> change the default voice in between `speak' requests.
> >>>
> >>> 3.2. SHOULD HAVE: An application will be able to specify the default
> >>> prosody and style elements for a voice. These elements will match
> >>> those defined in the SSML specification, and the synthesizer may
> >>> choose which attributes it wishes to support. Note that prosody,
> >>> voice and style elements specified in SSML sent as a `speak'
> >>> request
> >>> will temporarily override the default values.
> >>>
> >>> 3.3. SHOULD HAVE: An application should be able to provide the
> >>> synthesizer with an application-specific pronunciation lexicon
> >>> addenda. Note that using `phoneme' element in SSML is another way
> >>> to accomplish this on a very localized basis, and will override
> >>> any pronunciation lexicon data for the synthesizer.
> >>>
> >>> Rationale: This feature is necessary so that the application is
> >>> able to speak artificial words or words with explicitly modified
> >>> pronunciation (e.g. "the word ... is often mispronounced as ...
> >>> by foreign speakers").
> >>>
> >>> 3.4. MUST HAVE: Applications may assume they have their own local
> >>> copy of a synthesizer and voice. That is, one application's
> >>> configuration of a synthesizer or voice should not conflict with
> >>> another application's configuration settings.
> >>>
> >>> 3.5. MUST HAVE: Changing the default voice or voice/prosody element
> >>> attributes does not affect a `speak' in progress.
> >>> 4. Synthesis Process Requirements
> >>>
> >>> 4.1. MUST HAVE: The speech synthesizer driver is able to process
> >>> plain text (i.e. text that is not marked up via SSML) encoded in
> >>> the UTF-8 character encoding.
> >>>
> >>> 4.2. MUST HAVE: The speech synthesizer driver is able to process
> >>> text formatted using extended SSML markup defined in part B of
> >>> this document and encoded in UTF-8. The synthesizer may choose
> >>> to ignore markup it cannot handle or even to ignore all markup
> >>> as long as it is able to process the text inside the markup.
> >>>
> >>> 4.3. SHOULD HAVE: The speech synthesizer driver is able to properly
> >>> process the extended SSML markup defined in the part B. of this
> >>> document as SHOULD HAVE. Analogically for NICE TO HAVE.
> >>>
> >>> 4.4. MUST HAVE: An application must be able to cancel a synthesis
> >>> operation in progress. In case of hardware synthesizers, or
> >>> synthesizers that produce their own audio, this means cancelling
> >>> the audio output as well.
> >>>
> >>> 4.5. MUST HAVE: The speech synthesizer driver must be able to
> >>> process long input texts in such a way that the audio output
> >>> starts to be available for playing as soon as possible. An
> >>> application is not required to split long texts into smaller
> >>> pieces.
> >>>
> >>> 4.6. SHOULD HAVE: The speech synthesizer driver should honor the
> >>> Performance Guidelines described below.
> >>>
> >>> 4.7. NICE TO HAVE: It would be nice if a synthesizer were able to
> >>> support "rewind" and "repeat" functionality for an utterance (see
> >>> related descriptions in the MRCP specification).
> >>>
> >>> Rationale: This allows moving over long texts without the need to
> >>> synthesize the whole text and without loosing context.
> >>>
> >>> 4.8. NICE TO HAVE: It would be nice if a synthesizer were able to
> >>> support multilingual utterances.
> >>>
> >>> 4.9. SHOULD HAVE: A synthesizer should support notification of
> >>> `mark' elements, and the application should be able to align
> >>> these events with the synthesized audio.
> >>>
> >>> 4.10. NICE TO HAVE: It would be nice if a synthesizer supported
> >>> "word started" and "word ended" events and allowed alignment of
> >>> the events similar to that in 4.9.
> >>>
> >>> Rationale: This is useful to update cursor position as a displayed
> >>> text is spoken.
> >>>
> >>> 4.11. REMOVED (not directly important for accessibility)
> >>>
> >>> The former version: It would be nice if a synthesizer supported
> >>> timing information at the phoneme level and allowed alignment of
> >>> the events similar to that in 4.9. Rationale: This is useful
> >>> for talking heads.
> >>>
> >>>
> >>> 4.12. SHOULD HAVE: The application must be able to pause and resume
> >>> a synthesis operation in progress while still being able to handle
> >>> other synthesis requests in the meantime. In case of hardware
> >>> synthesizers, this means pausing and if possible resuming the
> >>> audio output as well.
> >>>
> >>> 4.13. REMOVED (not clear purpose, the SSML specs do not require
> >>> the 's' element to work this way)
> >>>
> >>> The synthesizer should not try to split the
> >>> contents of the `s' SSML element into several independent pieces,
> >>> unless required by a markup inside.
> >>>
> >>> Rationale: An application may have better information about the
> >>> synthesized text and perform its own splitting of sentences.
> >>>
> >>> 4.14. OUTSIDE SCOPE: Message management (queueing, ordering,
> >>> interleaving, etc.).
> >>>
> >>> 4.15. OUTSIDE SCOPE: Interfacing software synthesis with audio
> >>> output.
> >>>
> >>> 4.16. OUT OF SCOPE: Specifying the audio format to be used by a
> >>> synthesizer.
> >>>
> >>> 5. Performance Guidelines
> >>>
> >>> In order to make the speech synthesizer driver actually usable with
> >>> assistive technologies, it must satisfy certain performance
> >>> expectations. The following text provides a clue to the driver
> >>> implementors to get a rough idea about what is needed in practice.
> >>>
> >>> Typical scenarios when working with a speech enabled text editor:
> >>>
> >>> 5.1. Typed characters are spoken (echoed).
> >>> Reading of the characters and cancelling the synthesis
> >>> must be
> >>> very fast, to catch up with a fast typist or even with
> >>> autorepeat. Consider a typical autorepeat rate 25 characters per
> >>> second. Ideally within each of the 40 ms intervals synthesis
> >>> should begin, produce some audio output and stop. To perform
> >>> all these actions within 100 ms (considering a fast typist and
> >>> some overhead of the application and the audio output) on a
> >>> common hardware is very desirable.
> >>>
> >>> Appropriate character reading performance may be difficult to
> >>> achieve with contemporary software speech synthesizers, so it may
> >>> be necessary to use techniques like caching of the synthesized
> >>> characters. Also, it is necessary to ensure there is no initial
> >>> pause ("breathing in") within the synthesized character.
> >>>
> >>> 5.2. Moving over words or lines, each of them is spoken.
> >>>
> >>> The sound sample needn't be available as quickly as in case of the
> >>> typed characters, but it still should be available without clearly
> >>> noticeable delay. As the user moves over the words or lines, he
> >>> must hear the text immediately. Cancelling the synthesis of the
> >>> previous word or line must be instant.
> >>>
> >>> 5.3. Reading a large text file.
> >>>
> >>> In such a case, it is not necessary to start speaking instantly,
> >>> because reading a large text is not a very frequent operation.
> >>> One second long delay at the start is acceptable, although not
> >>> comfortable. Cancelling the speech must still be instant.
> >>>
> >>>
> >>> B. XML (extended SSML) Markup in Use
> >>>
> >>> This section defines the set of XML markup and special
> >>> attribute values for use in input texts for the drivers.
> >>> The markup consists of two namespaces: 'SSML' (default)
> >>> and 'tts', where 'tts' introduces several new attributes
> >>> to be used with the 'say-as' element and a new element
> >>> 'style'.
> >>>
> >>> If an SSML element is supported, all its mandatory attributes
> >>> by the definition of SSML 1.0 must be supported even if they
> >>> are not explicitly mentioned in this document.
> >>>
> >>> This section also defines which functions the API
> >>> needs to provide for default prosody, voice and style settings,
> >>> according to (3.2).
> >>>
> >>> Note: According to available information, SSML is not known
> >>> to suffer from any IP issues.
> >>>
> >>>
> >>> B.1. SHOULD HAVE: The following elements are supported
> >>> speak
> >>> voice
> >>> prosody
> >>> say-as
> >>>
> >>> B.1.1. These SPEAK attributes are supported
> >>> 1 (SHOULD HAVE): xml:lang
> >>>
> >>> B.1.1. These VOICE attributes are supported
> >>> 1 (SHOULD HAVE): xml:lang
> >>> 2 (SHOULD HAVE): name
> >>> 3 (NICE TO HAVE): gender
> >>> 4 (NICE TO HAVE): age
> >>> 5 (NICE TO HAVE): variant
> >>>
> >>> B.1.2. These PROSODY attributes are supported
> >>> 1 (SHOULD HAVE): pitch (with +/- %, "default")
> >>> 2 (SHOULD HAVE): rate (with +/- %, "default")
> >>> 3 (SHOULD HAVE): volume (with +/- %, "default")
> >>> 4 (NICE TO HAVE): range (with +/- %, "default")
> >>> 5 (NICE TO HAVE): 'pitch', 'rate', 'range'
> >>> with absolute value parameters
> >>>
> >>> Note: The corresponding global relative prosody settings
> >>> commands (not markup) in TTS API represent the percentage
> >>> value as a percentage change with respect to the default
> >>> value for the given voice and parameter, not with respect
> >>> to previous settings.
> >>>
> >>>
> >>> B.1.3. The SAY-AS attribute 'interpret-as'
> >>> is supported with the following values
> >>>
> >>> 1 (SHOULD HAVE) characters
> >>> The format 'glyphs' is supported.
> >>>
> >>> Rationale: This provides capability for spelling.
> >>>
> >>> 2 (SHOULD HAVE) tts:char
> >>> Indicates the content of the element is a single
> >>> character and it should be pronounced as a character.
> >>> The element's contents (CDATA) should only contain
> >>> a single character.
> >>>
> >>> This is different than the interpret-as value "characters"
> >>> described in B.1.3.1. While "characters" is intended
> >>> for spelling words and sentences, "tts:char" means
> >>> pronouncing the given character (which might be subject
> >>> to different settings, as for example using sound icons to
> >>> represent symbols).
> >>>
> >>> If more than one character is present as the contents
> >>> of the element, this is considered an error.
> >>>
> >>> Example:
> >>> <speak>
> >>> <say-as interpret-as="tts:char">@</say-as>
> >>> </speak>
> >>>
> >>> Rationale: It is useful to have a separate attribute
> >>> for "single characters" as this can be used in TTS
> >>> configuration to distinguish the situation when
> >>> the user is moving with cursor over characters
> >>> from the situation of spelling. As well as in other
> >>> situations where the concept of "single character"
> >>> has some logical meaning.
> >>>
> >>> 3 (SHOULD HAVE) tts:key
> >>> The content of the element should be interpreted
> >>> as the name of a keyboard key or combination of keys. See
> >>> section (C) for possible string values of content of this
> >>> element. If a string is given which is not defined in section
> >>> (C), the behavior of the synthesizer is undefined.
> >>>
> >>> Example:
> >>> <speak>
> >>> <say-as interpret-as="tts:char">shift_a</say-as>
> >>> </speak>
> >>>
> >>> 4 (NICE TO HAVE) tts:digits
> >>> Indicates the content of the element is a number.
> >>> The attribute "detail" is supported and can take a numerical
> >>> value, meaning how many digits should the synthesizer group
> >>> for reading. The value of 0 means the number should be
> >>> pronounced as a whole appropriate for the language, while any
> >>> non-zero value means that a groups of so many digits should be
> >>> formed for reading, starting from left.
> >>>
> >>> Example: The string "5431721838" would normally be read
> >>> as "five billion four hundred thirty seven million ..." but
> >>> when enclosed in the above say-as with detail set to 3, it
> >>> would be read as "five hundred forty three, one hundred
> >>> seventy two etc." or "five, four, three, seven etc." with
> >>> detail 1.
> >>>
> >>> Note: This is an extension to SSML not defined in the
> >>> format itself, introduced under the namespace 'tts' (as
> >>> allowed in SSML 'say-as' specifications).
> >>>
> >>>
> >>> B.2. NICE TO HAVE: The following elements are supported
> >>> mark
> >>> s
> >>> p
> >>> phoneme
> >>> sub
> >>>
> >>> B.2.1. NICE TO HAVE: These P attributes are supported:
> >>> 1 xml:lang
> >>>
> >>> B.2.2. NICE TO HAVE: These S attributes are supported 1 xml:lang
> >>>
> >>> B.3. SHOULD HAVE: An element `tts:style' (not defined in SSML 1.0)
> >>> is supported.
> >>>
> >>> This element can occur anywhere inside the SSML document.
> >>> It may contain all SSML elements except the element 'speak'
> >>> and it may also contain the element 'tts:style'.
> >>>
> >>> It has two mandatory attributes 'field'
> >>> and 'mode' and an optional string attribute 'detail'. The
> >>> attribute 'field' can take the following values
> >>> 1) punctuation
> >>> 2) capital_letters
> >>> defined below.
> >>>
> >>> If the parameter field is set to 'punctuation',
> >>> the 'mode' attribute can take the following values
> >>> 1) none
> >>> 2) all
> >>> 3) (NICE TO HAVE) some
> >>> When set to 'none', no punctuation characters are explicitly
> >>> indicated. When it is set to 'all', all punctuation characters
> >>> in the text should be indicated by the synthesizer. When
> >>> set to 'some', the synthesizer will pronounce those
> >>> punctuation characters enumerated in the additional attribute
> >>> 'detail' or will only speak those characters according to its
> >>> settings if no 'detail' attribute is specified.
> >>>
> >>> The attribute detail takes the form of a string containing
> >>> the punctuation characters to read.
> >>>
> >>> Example:
> >>> <tts:style field="punctuation" mode="some" detail=".?!">
> >>>
> >>> If the parameters field is set to 'capital_letters',
> >>> the 'mode' attribute can take the following values
> >>> 1) no
> >>> 2) spelling
> >>> 3) (NICE TO HAVE) icon
> >>> 4) (NICE TO HAVE) pitch
> >>>
> >>> When set to 'no', capital letters are not explicitly
> >>> indicated. When set to 'spell', capital letters are
> >>> spelled (e.g. "capital a"). When set to 'icon', a sound
> >>> is inserted before the capital letter, possibly leaving
> >>> the letter/word/sentence intact. When set to 'pitch',
> >>> the capital letter is pronounced with a higher pitch,
> >>> possibly leaving the letter/word/sentence intact.
> >>>
> >>>
> >>> Rationale: These are basic capabilities well established
> >>> in accessibility. However, SSML does not support them.
> >>> Introducing this additional element does not break the
> >>> possibility of outside applications to send valid SSML
> >>> into TTS API.
> >>>
> >>> B.4. NICE TO HAVE: Support for the rest of elements and attributes
> >>> defined in SSML 1.0. However, this is of lower priority than
> >>> the enumerated subset above.
> >>>
> >>> Open Issue: In many situations, it will be desirable to
> >>> preserve whitespace characters in the incoming document.
> >>> Should we require the application to use the 'xml:space'
> >>> attribute for the speak element or should we state 'preserve'
> >>> is the default value for 'xml:space' in the root 'speak'
> >>> element in this case?
> >>>
> >>> C. Key names
> >>>
> >>> Key name may contain any character excluding control characters (the
> >>> characters in the range 0 to 31 in the ASCII table and other
> >>> ``invisible'' characters), spaces, dashes and underscores.
> >>>
> >>> C.1 The recognized key names are:
> >>> 1) Any single UTF-8 character, excluding the exceptions defined
> >>> above.
> >>>
> >>> 2) Any of the symbolic key names defined bellow.
> >>>
> >>> 3) A combination of key names defined bellow using the
> >>> '_' (underscore) character for concatenation.
> >>>
> >>> Examples of valid key names:
> >>> A
> >>> shift_a
> >>> shift_A
> >>> $
> >>> enter
> >>> shift_kp-enter
> >>> control
> >>> control_alt_delete
> >>>
> >>> C.2 List of symbolic key names
> >>>
> >>> C.2.1 Escaped keys
> >>> space
> >>> underscore
> >>> dash
> >>>
> >>> C.2.2 Auxiliary Keys
> >>> alt
> >>> control
> >>> hyper
> >>> meta
> >>> shift
> >>> super
> >>>
> >>> C.2.3 Control Character Keys
> >>> backspace
> >>> break
> >>> delete
> >>> down
> >>> end
> >>> enter
> >>> escape
> >>> f1
> >>> f2 ... f24
> >>> home
> >>> insert
> >>> kp-*
> >>> kp-+
> >>> kp--
> >>> kp-.
> >>> kp-/
> >>> kp-0 kp-1 ... kp-9
> >>> kp-2
> >>> kp-enter
> >>> left
> >>> menu
> >>> next
> >>> num-lock
> >>> pause
> >>> print
> >>> prior
> >>> return
> >>> right
> >>> scroll-lock
> >>> space
> >>> tab
> >>> up
> >>> window
> >>>
> >>> D. Interface Description
> >>>
> >>> This section defines the low-level TTS driver interface for use by
> >>> all assistive technologies on free software platforms.
> >>>
> >>> 1. Speech Synthesis Driver Discovery
> >>> ...
> >>>
> >>> 2. Speech Synthesis Driver Interface
> >>>
> >>> ...
> >>>
> >>> Open Issue: Still not clear consensus on how to return the
> >>> synthesized audio data (if at all). The main issue here is
> >>> mostly with how to align marker and other time-related events
> >>> with the audio being played on the audio output device.
> >>>
> >>> Proposal: There will be 2 possible ways to do it. The synthesized
> >>> data can be returned to the application (case A) or the
> >>> application can ask for them being played on the audio (which
> >>> will not be the task of TTS API, but will be handled by
> >>> another API) (case B).
> >>>
> >>> In (case A), each time the application gets a piece of audio
> >>> data, it also gets a time-table of index marks and events
> >>> in that piece of data. This will be done on a separate socket
> >>> in asynchronous mode. (This is possible for software
> >>> synthesizers only, however.)
> >>>
> >>> In (case B), the application will get asynchronous callbacks
> >>> (they might be realized by sending a defined string over
> >>> a socket, by calling a callback function or in some other
> >>> way -- the particular way of doing it is considered an
> >>> implementation detail).
> >>>
> >>> Rationale: Both approaches are useful in different situations
> >>> and each of them provides some capability that the other one
> >>> doesn't.
> >>>
> >>> Open Issue: Will the interaction with the driver be synchronous
> >>> or asynchronous? For example, will a call to `speak'
> >>> wait to return until all the audio has been processed? If
> >>> not, what happens when a call to "speak" is made while the
> >>> synthesizer is still processing a prior call to "speak?"
> >>>
> >>> Proposal: With the exception of events and index marks signalling,
> >>> the communication will be synchronous. When a speak request
> >>> is issued while the is still processing a prior call to speak
> >>> and the application didn't call pause before, this is
> >>> considered an error.
> >>>
> >>> E. Related Specifications
> >>>
> >>> SSML: http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/
> >>> (see requirements at the following URL:
> >>>
> >>> http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#ref-reqs)
> >>>
> >>> SSML 'say-as' element attribute values:
> >>> http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/
> >>>
> >>> MRCP: http://www.ietf.org/html.charters/speechsc-charter.html
> >>>
> >>> F. Copying This Document
> >>>
> >>> Copyright (C) 2006 ...
> >>> This specification is made available under a BSD-style license ...
> >>>
> >>> _______________________________________________
> >>> accessibility mailing list
> >>> accessibility at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/accessibility
> >>>
> >>
> >>
> >>
> >>> _______________________________________________
> >>> Accessibility mailing list
> >>> Accessibility at lists.freestandards.org
> >>> http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility
> >>>
> >>
> >>
> >>
> >>
> > _______________________________________________
> > Accessibility mailing list
> > Accessibility at lists.freestandards.org
> > http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility
>
> _______________________________________________
> Accessibility mailing list
> Accessibility at lists.freestandards.org
> http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility
More information about the Accessibility
mailing list