Fwd: [Accessibility] TTS API document + introduction

Bill Haneman Bill.Haneman at Sun.COM
Wed Mar 8 10:18:17 PST 2006


On Wed, 2006-03-08 at 18:13, Peter Korn wrote:
...
> > Does this mean we're moving this to March 15?
> I think we should.  I believe Willie is at an appointment and may not be 
> back in time for our meeting slot today.
> 
> I also wonder about whether we should meet on the 15th, or the 22nd.  
> The CSUN conference is March 21-25, and at least one of us will be at 
> the "What's new in GNOME & Java Accessibility" talk from 10:40 to 
> 11:40am PT on the 22nd (that'd be me, who is co-presenting that talk).  
> And the week before I know some of us will still be busy in conference 
> preparations...

I appreciate that.  I am also aware, though, that this would mean the
at-spi discussions related to Mozilla/gecko/Firefox would be delayed for
longer as well, and they will already have been postponed two weeks.  

Bill

> 
> 
> Peter
> >
> > Bill
> >
> >> Olaf Jan Schmidt writes:
> >>  
> >>
> >>> Hi!
> >>>
> >>> For those who are not subscribed to accessibility at freedesktop.org I 
> >>> am forwarding the latest draft for the joint TTS API that we need 
> >>> for reworking kttsd and SpeechDispatcher.
> >>>
> >>> Hynek has written an introduction that summarises our approach. I 
> >>> hope it helps our discussion on Wednesday.
> >>>
> >>> Please cc the freedesktop.org list in your comments, because I want 
> >>> to make sure that there is at least one place where all the email 
> >>> discussion goes.
> >>>
> >>> Olaf
> >>>
> >>> -- 
> >>> Olaf Jan Schmidt, KDE Accessibility co-maintainer, open standards 
> >>> accessibility networker, Protestant theology student and webmaster 
> >>> of http://accessibility.kde.org/ and http://www.amen-online.de/
> >>>   
> >>
> >> Content-Description: Hynek Hanke <hanke at brailcom.org>: 
> >> [Accessibility] TTS API document + introduction
> >>  
> >>
> >>> From: Hynek Hanke <hanke at brailcom.org>
> >>> To: "Accessibility, Freedesktop" <accessibility at freedesktop.org>
> >>>
> >>>
> >>> Hello,
> >>>
> >>> here is the latest version of the TTS API document with a new
> >>> introduction section trying to summarize the previous private and
> >>> public discussions on this topic. Comments are welcomed.
> >>>
> >>> With regards,
> >>> Hynek Hanke
> >>>
> >>> Changes
> >>> =======
> >>>
> >>> * Introduction was written (clarification of intent, scope)
> >>>
> >>> * Clarification of the meaning of MUST HAVE, SHOULD HAVE
> >>>
> >>> * Point (4.11) was removed as not directly important for accessibility
> >>> (after discussions with Willie Walker who requested the point)
> >>>
> >>> * Point (4.13) was removed because its purpose is not clear.
> >>>  Even if this functionality is needed, the 's' SSML element is
> >>>  not a good way to do it.
> >>>
> >>> * Reformulation of (1.4), added 'temporarily' to (3.2), 'software
> >>>  synthesizers' in (4.4), terminology in (4.13),
> >>>  clarification in (B.1.3/2) and (B.1.3/3)
> >>>  
> >>>
> >>> Common TTS Driver Interface
> >>> ============================
> >>> Document version: 2006-03-06
> >>>
> >>> The purpose of this document is to define a common low-level interface
> >>> to access the various speech synthesizers on Free Software and Open
> >>> Source platforms. It is designed to be used by applications that do
> >>> not need the advanced functionality like message management and by
> >>> applications providing high-level interfaces (such as Speech
> >>> Dispatcher, Gnome Speech, KTTSD etc.)  The purpose of this document is
> >>> not to define and force an API on the speech synthesizers. The
> >>> synthesizers might use different interfaces that will be handled by
> >>> their drivers.
> >>>
> >>> This interface will be implemented by a simple layer integrating
> >>> available speech synthesis drivers and in some cases emulating some of
> >>> the functionality missing in the synthesizers themselves.
> >>>
> >>> Advanced capabilities not directly related to speech, like message
> >>> management, priorities, synchronization etc. are left out of scope for
> >>> this low-level interface. They will be dealt with by higher-level
> >>> interfaces. (It is desirable to be able to agree on a common
> >>> higher-level interface too, but agreeing first on a low-level
> >>> interface is an easier task to accomplish.) Such high-level interface
> >>> (not necessarily limited to speech) will make good use of the already
> >>> existing low-level interface.
> >>>
> >>> It is desirable that simple applications can use this API in a simple
> >>> way. However, the API must also be complex enough so that it doesn't
> >>> limit more advanced applications in use of the synthesizers.
> >>>
> >>> The first part (A) of this document describes the requirements
> >>> gathered between projects like Gnome Speech, Speech Dispatcher, KTTSD,
> >>> Emacspeak and SpeakUp of what they might reasonably expect from speech
> >>> synthesis on a system. These requirements are not meant to be the
> >>> requirements on the synthesizers, although they might be a guide to
> >>> synthesizer authors as they plan future features and capabilities for
> >>> their products. Parts (B) and (C) describe the XML/SSML markup in use
> >>> and part (D) defines the interface.
> >>>
> >>> Temporary note: The goal of this interface is real implementation in
> >>> foreseeable future.  The next step will be merging the available
> >>> engine drivers in the various accessibility projects under this
> >>> interface and using this interface. For this reason, we need all
> >>> accessibility projects who want to participate in this common effort
> >>> to make sure all their requirements on a low-level speech output
> >>> interface are met and that such an interface is defined that it is
> >>> suitable for their needs.
> >>>
> >>> Temporary note: Any comments about this draft are welcome and
> >>> useful. But since the goal of these requirements is real
> >>> implementation, we need to avoid endless discussions and keep the
> >>> comments focused and to the point.
> >>>
> >>> A. Requirements
> >>>
> >>>  This section defines a set of requirements on the interface and on
> >>>  speech synthesizer drivers that need to support assistive
> >>>  technologies on free software platforms.
> >>>
> >>>  1. Design Criteria
> >>>
> >>>    The Common TTS Driver Interface requirements will be developed
> >>>    within the following broad design criteria:
> >>>
> >>>    1.1. Focus on supporting assistive technologies first.  These
> >>>      assistive technologies can be written in any programming language
> >>>      and may provide specific support for particular environments such
> >>>      as KDE or GNOME.
> >>>
> >>>    1.2. Simple and specific requirements win out over complex and
> >>>      general requirements.
> >>>
> >>>    1.3. Use existing APIs and specs when possible.
> >>>
> >>>
> >>>    1.4 All language dependent functionality with respect to text
> >>>     processing for speech synthesis should be covered in the
> >>>     synthesizers or synthesis drivers, not in applications.
> >>>
> >>>    1.5. Requirements will be categorized in the following priority
> >>>      order: MUST HAVE, SHOULD HAVE, and NICE TO HAVE.
> >>>
> >>>      The priorities have the following meanings with respect
> >>>      to the drivers available under this API:
> >>>               MUST HAVE: All drivers must satisfy this requirement.
> >>>
> >>>      SHOULD HAVE: The driver will be usable without this feature, but
> >>>        it is expected the feature is implemented in all drivers
> >>>        intended for serious use.
> >>>
> >>>      NICE TO HAVE: Optional features.
> >>>
> >>>      Regardless of the priority, full interface will be provided
> >>>      by the API, even when the given functionality is actually not
> >>>      implemented behind the interface.
> >>>
> >>>    1.6. Requirements outside the scope of this document will be
> >>>      labelled as OUTSIDE SCOPE.
> >>>
> >>>    1.7. An application must be able to determine if SHOULD HAVE
> >>>      and NICE TO HAVE features are supported for a given driver.
> >>>
> >>>
> >>>  2. Synthesizer Discovery Requirements
> >>>
> >>>    2.1. MUST HAVE: An application will be able to discover all speech
> >>>      synthesizer drivers available to the machine.
> >>>
> >>>    2.2. MUST HAVE: An application will be able to discover all possible
> >>>      voices available for a particular speech synthesizer driver.
> >>>
> >>>    2.3. MUST HAVE: An application will be able to determine the
> >>>      supported languages, possibly including also a dialect or a
> >>>      country, for each voice available for a particular speech
> >>>      synthesizer driver.
> >>>
> >>>      Rationale: Knowledge about available voices and languages is
> >>>      necessary to select proper driver and to be able to select a
> >>>      supported language or different voices in an application.
> >>>
> >>>    2.4. MUST HAVE: Applications may assume their interaction with the
> >>>      speech synthesizer driver doesn't affect other operating system
> >>>      components in any unexpected way.
> >>>
> >>>    2.5. OUTSIDE SCOPE: Higher level communication interfaces     to 
> >>> the speech synthesizer drivers. Exact form of the
> >>>        communication protocol (text protocol, IPC etc).
> >>>
> >>>      Note: It is expected they will be implemented by particular
> >>>      projects (Gnome Speech, KTTSD, Speech Dispatcher) as wrappers
> >>>      around the low-level communication interface defined below.
> >>>
> >>>
> >>>  3. Synthesizer Configuration Requirements
> >>>
> >>>    3.1. MUST HAVE: An application will be able to specify the default
> >>>      voice to use for a particular synthesizer, and will be able to
> >>>      change the default voice in between `speak' requests.
> >>>
> >>>    3.2. SHOULD HAVE: An application will be able to specify the default
> >>>      prosody and style elements for a voice.  These elements will match
> >>>      those defined in the SSML specification, and the synthesizer may
> >>>      choose which attributes it wishes to support.  Note that prosody,
> >>>      voice and style elements specified in SSML sent as a `speak'
> >>> request
> >>>      will temporarily override the default values.
> >>>
> >>>    3.3. SHOULD HAVE: An application should be able to provide the
> >>>      synthesizer with an application-specific pronunciation lexicon
> >>>      addenda.  Note that using `phoneme' element in SSML is another way
> >>>      to accomplish this on a very localized basis, and will override
> >>>      any pronunciation lexicon data for the synthesizer.
> >>>
> >>>      Rationale: This feature is necessary so that the application is
> >>>      able to speak artificial words or words with explicitly modified
> >>>      pronunciation (e.g. "the word ... is often mispronounced as ...
> >>>      by foreign speakers").
> >>>
> >>>    3.4. MUST HAVE: Applications may assume they have their own local
> >>>      copy of a synthesizer and voice.  That is, one application's
> >>>      configuration of a synthesizer or voice should not conflict with
> >>>      another application's configuration settings.
> >>>
> >>>    3.5. MUST HAVE: Changing the default voice or voice/prosody element
> >>>      attributes does not affect a `speak' in progress.
> >>>           4. Synthesis Process Requirements
> >>>
> >>>    4.1. MUST HAVE: The speech synthesizer driver is able to process
> >>>      plain text (i.e. text that is not marked up via SSML) encoded in
> >>>      the UTF-8 character encoding.
> >>>
> >>>    4.2. MUST HAVE: The speech synthesizer driver is able to process
> >>>      text formatted using extended SSML markup defined in part B of
> >>>      this document and encoded in UTF-8.  The synthesizer may choose
> >>>      to ignore markup it cannot handle or even to ignore all markup
> >>>      as long as it  is able to process the text inside the markup.
> >>>
> >>>    4.3. SHOULD HAVE: The speech synthesizer driver is able to properly
> >>>      process the extended SSML markup defined in the part B. of this
> >>>      document as SHOULD HAVE. Analogically for NICE TO HAVE.
> >>>
> >>>    4.4. MUST HAVE: An application must be able to cancel a synthesis
> >>>      operation in progress.  In case of hardware synthesizers, or
> >>>      synthesizers that produce their own audio, this means cancelling
> >>>      the audio output as well.
> >>>
> >>>    4.5. MUST HAVE: The speech synthesizer driver must be able to
> >>>      process long input texts in such a way that the audio output
> >>>      starts to be available for playing as soon as possible.  An
> >>>      application is not required to split long texts into smaller
> >>>      pieces.
> >>>
> >>>    4.6. SHOULD HAVE: The speech synthesizer driver should honor the
> >>>      Performance Guidelines described below.
> >>>
> >>>    4.7. NICE TO HAVE: It would be nice if a synthesizer were able to
> >>>      support "rewind" and "repeat" functionality for an utterance (see
> >>>      related descriptions in the MRCP specification).
> >>>
> >>>      Rationale: This allows moving over long texts without the need to
> >>>      synthesize the whole text and without loosing context.
> >>>
> >>>    4.8. NICE TO HAVE: It would be nice if a synthesizer were able to
> >>>      support multilingual utterances.
> >>>
> >>>    4.9. SHOULD HAVE: A synthesizer should support notification of
> >>>      `mark' elements, and the application should be able to align
> >>>      these events with the synthesized audio.
> >>>
> >>>    4.10. NICE TO HAVE: It would be nice if a synthesizer supported
> >>>      "word started" and "word ended" events and allowed alignment of
> >>>      the events similar to that in 4.9.
> >>>
> >>>      Rationale: This is useful to update cursor position as a displayed
> >>>      text is spoken.
> >>>
> >>>    4.11. REMOVED (not directly important for accessibility)
> >>>
> >>>      The former version: It would be nice if a synthesizer supported
> >>>      timing information at the phoneme level and allowed alignment of
> >>>      the events similar to that in 4.9.  Rationale: This is useful
> >>>      for talking heads.
> >>>
> >>>
> >>>    4.12. SHOULD HAVE: The application must be able to pause and resume
> >>>      a synthesis operation in progress while still being able to handle
> >>>      other synthesis requests in the meantime.  In case of hardware
> >>>      synthesizers, this means pausing and if possible resuming the
> >>>      audio output as well.
> >>>
> >>>    4.13. REMOVED (not clear purpose, the SSML specs do not require
> >>>      the 's' element to work this way)
> >>>
> >>>      The synthesizer should not try to split the
> >>>      contents of the `s' SSML element into several independent pieces,
> >>>      unless required by a markup inside.
> >>>
> >>>      Rationale: An application may have better information about the
> >>>      synthesized text and perform its own splitting of sentences.
> >>>
> >>>    4.14. OUTSIDE SCOPE: Message management (queueing, ordering,
> >>>      interleaving, etc.).
> >>>
> >>>    4.15. OUTSIDE SCOPE: Interfacing software synthesis with audio
> >>>      output.
> >>>
> >>>    4.16. OUT OF SCOPE: Specifying the audio format to be used by a
> >>>     synthesizer.
> >>>
> >>>   5. Performance Guidelines
> >>>
> >>>     In order to make the speech synthesizer driver actually usable with
> >>>     assistive technologies, it must satisfy certain performance
> >>>     expectations.  The following text provides a clue to the driver
> >>>     implementors to get a rough idea about what is needed in practice.
> >>>
> >>>     Typical scenarios when working with a speech enabled text editor:
> >>>
> >>>     5.1. Typed characters are spoken (echoed).
> >>>           Reading of the characters and cancelling the synthesis 
> >>> must be
> >>>       very fast, to catch up with a fast typist or even with
> >>>       autorepeat.  Consider a typical autorepeat rate 25 characters per
> >>>       second.  Ideally within each of the 40 ms intervals synthesis
> >>>       should begin, produce some audio output and stop.  To perform
> >>>       all these actions within 100 ms (considering a fast typist and
> >>>       some overhead of the application and the audio output) on a
> >>>       common hardware is very desirable.
> >>>
> >>>       Appropriate character reading performance may be difficult to
> >>>       achieve with contemporary software speech synthesizers, so it may
> >>>       be necessary to use techniques like caching of the synthesized
> >>>       characters.  Also, it is necessary to ensure there is no initial
> >>>       pause ("breathing in") within the synthesized character.
> >>>
> >>>    5.2. Moving over words or lines, each of them is spoken.
> >>>
> >>>      The sound sample needn't be available as quickly as in case of the
> >>>      typed characters, but it still should be available without clearly
> >>>      noticeable delay.  As the user moves over the words or lines, he
> >>>      must hear the text immediately.  Cancelling the synthesis of the
> >>>      previous word or line must be instant.
> >>>
> >>>    5.3. Reading a large text file.
> >>>
> >>>      In such a case, it is not necessary to start speaking instantly,
> >>>      because reading a large text is not a very frequent operation.
> >>>      One second long delay at the start is acceptable, although not
> >>>      comfortable.  Cancelling the speech must still be instant.
> >>>
> >>>
> >>> B. XML (extended SSML) Markup in Use
> >>>
> >>>  This section defines the set of XML markup and special
> >>>  attribute values for use in input texts for the drivers.
> >>>  The markup consists of two namespaces: 'SSML' (default)
> >>>  and 'tts', where 'tts' introduces several new attributes
> >>>  to be used with the 'say-as' element and a new element
> >>>  'style'.
> >>>
> >>>  If an SSML element is supported, all its mandatory attributes
> >>>  by the definition of SSML 1.0 must be supported even if they
> >>>  are not explicitly mentioned in this document.
> >>>
> >>>  This section also defines which functions the API
> >>>  needs to provide for default prosody, voice and style settings,
> >>>  according to (3.2).
> >>>
> >>>  Note: According to available information, SSML is not known
> >>>  to suffer from any IP issues.
> >>>
> >>>
> >>>  B.1. SHOULD HAVE: The following elements are supported
> >>>     speak
> >>>     voice
> >>>     prosody
> >>>     say-as
> >>>
> >>>  B.1.1. These SPEAK attributes are supported
> >>>     1 (SHOULD HAVE): xml:lang
> >>>
> >>>  B.1.1. These VOICE attributes are supported
> >>>     1 (SHOULD HAVE):  xml:lang
> >>>     2 (SHOULD HAVE):  name
> >>>     3 (NICE TO HAVE): gender
> >>>     4 (NICE TO HAVE): age
> >>>     5 (NICE TO HAVE): variant
> >>>
> >>>  B.1.2. These PROSODY attributes are supported
> >>>     1 (SHOULD HAVE): pitch  (with +/- %, "default")
> >>>     2 (SHOULD HAVE): rate   (with +/- %, "default")
> >>>     3 (SHOULD HAVE): volume (with +/- %, "default")
> >>>     4 (NICE TO HAVE): range  (with +/- %, "default")
> >>>     5 (NICE TO HAVE): 'pitch', 'rate', 'range'
> >>>              with absolute value parameters
> >>>        
> >>>   Note: The corresponding global relative prosody settings
> >>>   commands (not markup) in TTS API represent the percentage
> >>>   value as a percentage change with respect to the default
> >>>   value for the given voice and parameter, not with respect
> >>>   to previous settings.
> >>>
> >>>
> >>>  B.1.3. The SAY-AS attribute 'interpret-as'
> >>>     is supported with the following values
> >>>
> >>>     1 (SHOULD HAVE) characters
> >>>         The format 'glyphs' is supported.
> >>>
> >>>     Rationale: This provides capability for spelling.
> >>>
> >>>     2 (SHOULD HAVE) tts:char
> >>>         Indicates the content of the element is a single
> >>>     character and it should be pronounced as a character.
> >>>     The element's contents (CDATA) should only contain
> >>>     a single character.
> >>>
> >>>     This is different than the interpret-as value "characters"
> >>>     described in B.1.3.1. While "characters" is intended
> >>>     for spelling words and sentences, "tts:char" means
> >>>     pronouncing the given character (which might be subject
> >>>     to different settings, as for example using sound icons to
> >>>     represent symbols).   
> >>>
> >>>     If more than one character is present as the contents
> >>>     of the element, this is considered an error.
> >>>
> >>>     Example:
> >>>     <speak>
> >>>     <say-as interpret-as="tts:char">@</say-as>
> >>>     </speak>       
> >>>
> >>>     Rationale: It is useful to have a separate attribute
> >>>     for "single characters" as this can be used in TTS
> >>>     configuration to distinguish the situation when
> >>>     the user is moving with cursor over characters
> >>>        from the situation of spelling. As well as in other
> >>>     situations where the concept of "single character"
> >>>     has some logical meaning.
> >>>        
> >>>     3 (SHOULD HAVE) tts:key
> >>>         The content of the element should be interpreted
> >>>     as the name of a keyboard key or combination of keys. See
> >>>     section (C) for possible string values of content of this
> >>>     element. If a string is given which is not defined in section
> >>>     (C), the behavior of the synthesizer is undefined.
> >>>
> >>>     Example:
> >>>     <speak>
> >>>     <say-as interpret-as="tts:char">shift_a</say-as>
> >>>     </speak>
> >>>
> >>>     4 (NICE TO HAVE) tts:digits
> >>>         Indicates the content of the element is a number.
> >>>     The attribute "detail" is supported and can take a numerical
> >>>     value, meaning how many digits should the synthesizer group
> >>>     for reading. The value of 0 means the number should be
> >>>     pronounced as a whole appropriate for the language, while any
> >>>     non-zero value means that a groups of so many digits should be
> >>>     formed for reading, starting from left.
> >>>
> >>>     Example: The string "5431721838" would normally be read
> >>>     as "five billion four hundred thirty seven million ..." but
> >>>     when enclosed in the above say-as with detail set to 3, it
> >>>     would be read as "five hundred forty three, one hundred
> >>>     seventy two etc." or "five, four, three, seven etc." with
> >>>     detail 1.
> >>>
> >>>     Note: This is an extension to SSML not defined in the
> >>>     format itself, introduced under the namespace 'tts' (as
> >>>     allowed    in SSML 'say-as' specifications).
> >>>
> >>>
> >>>  B.2. NICE TO HAVE: The following elements are supported
> >>>     mark
> >>>     s
> >>>     p
> >>>     phoneme
> >>>     sub
> >>>
> >>>  B.2.1. NICE TO HAVE: These P attributes are supported:
> >>>     1 xml:lang
> >>>
> >>>  B.2.2. NICE TO HAVE: These S attributes are supported     1 xml:lang
> >>>
> >>>  B.3. SHOULD HAVE: An element `tts:style' (not defined in SSML 1.0)
> >>>     is supported.
> >>>
> >>>     This element can occur anywhere inside the SSML document.
> >>>     It may contain all SSML elements except the element 'speak'
> >>>     and it may also contain the element 'tts:style'.
> >>>
> >>>     It has two mandatory attributes 'field'
> >>>     and 'mode' and an optional string attribute 'detail'. The
> >>>     attribute 'field' can take the following values
> >>>         1) punctuation
> >>>         2) capital_letters
> >>>     defined below.
> >>>
> >>>     If the parameter field is set to 'punctuation',
> >>>     the 'mode' attribute can take the following values
> >>>         1) none
> >>>         2) all
> >>>         3) (NICE TO HAVE) some
> >>>     When set to 'none', no punctuation characters are explicitly
> >>>     indicated. When it is set to 'all', all punctuation characters
> >>>     in the text should be indicated by the synthesizer.  When
> >>>     set to 'some', the synthesizer will pronounce those
> >>>     punctuation characters enumerated in the additional attribute
> >>>        'detail' or will only speak those characters according to its
> >>>     settings if no 'detail' attribute is specified.
> >>>
> >>>     The attribute detail takes the form of a string containing
> >>>     the punctuation characters to read.
> >>>
> >>>     Example:
> >>>     <tts:style field="punctuation" mode="some" detail=".?!">
> >>>
> >>>     If the parameters field is set to 'capital_letters',
> >>>     the 'mode' attribute can take the following values
> >>>         1) no
> >>>         2) spelling
> >>>         3) (NICE TO HAVE) icon
> >>>         4) (NICE TO HAVE) pitch
> >>>
> >>>     When set to 'no', capital letters are not explicitly
> >>>     indicated. When set to 'spell', capital letters are
> >>>     spelled (e.g. "capital a"). When set to 'icon', a sound
> >>>     is inserted before the capital letter, possibly leaving
> >>>     the letter/word/sentence intact. When set to 'pitch',
> >>>     the capital letter is pronounced with a higher pitch,
> >>>     possibly leaving the letter/word/sentence intact.
> >>>
> >>>
> >>>     Rationale: These are basic capabilities well established
> >>>     in accessibility. However, SSML does not support them. 
> >>>     Introducing this additional element does not break the
> >>>     possibility of outside applications to send valid SSML
> >>>     into TTS API.
> >>>
> >>>  B.4. NICE TO HAVE: Support for the rest of elements and attributes
> >>>     defined in SSML 1.0. However, this is of lower priority than
> >>>     the enumerated subset above.
> >>>
> >>>  Open Issue: In many situations, it will be desirable to
> >>>   preserve whitespace characters in the incoming document.
> >>>   Should we require the application to use the 'xml:space'
> >>>   attribute for the speak element or should we state 'preserve'
> >>>   is the default value for 'xml:space' in the root 'speak'
> >>>   element in this case?
> >>>
> >>> C. Key names
> >>>
> >>> Key name may contain any character excluding control characters (the
> >>> characters in the range 0 to 31 in the ASCII table and other
> >>> ``invisible'' characters), spaces, dashes and underscores.
> >>>
> >>>  C.1 The recognized key names are:
> >>>   1) Any single UTF-8 character, excluding the exceptions defined
> >>>      above.
> >>>
> >>>   2) Any of the symbolic key names defined bellow.
> >>>
> >>>   3) A combination of key names defined bellow using the
> >>>     '_' (underscore) character for concatenation.
> >>>
> >>>   Examples of valid key names:
> >>>     A
> >>>     shift_a
> >>>     shift_A
> >>>     $
> >>>     enter
> >>>     shift_kp-enter
> >>>     control
> >>>     control_alt_delete
> >>>  
> >>>  C.2 List of symbolic key names
> >>>
> >>>  C.2.1 Escaped keys
> >>>     space
> >>>     underscore
> >>>     dash
> >>>
> >>>  C.2.2 Auxiliary Keys
> >>>     alt
> >>>     control
> >>>     hyper
> >>>     meta
> >>>     shift
> >>>     super
> >>>
> >>>  C.2.3 Control Character Keys
> >>>     backspace
> >>>     break
> >>>     delete
> >>>     down
> >>>     end
> >>>     enter
> >>>     escape
> >>>     f1
> >>>     f2 ... f24
> >>>     home
> >>>     insert
> >>>     kp-*
> >>>     kp-+
> >>>     kp--
> >>>     kp-.
> >>>     kp-/
> >>>     kp-0     kp-1 ... kp-9
> >>>     kp-2
> >>>     kp-enter
> >>>     left
> >>>     menu
> >>>     next
> >>>     num-lock
> >>>     pause
> >>>     print
> >>>     prior
> >>>     return
> >>>     right
> >>>     scroll-lock
> >>>     space
> >>>     tab
> >>>     up
> >>>     window
> >>>
> >>> D. Interface Description
> >>>
> >>>  This section defines the low-level TTS driver interface for use by
> >>>  all assistive technologies on free software platforms.
> >>>
> >>>  1. Speech Synthesis Driver Discovery
> >>>    ...
> >>>
> >>>  2. Speech Synthesis Driver Interface
> >>>
> >>>  ...
> >>>
> >>>  Open Issue: Still not clear consensus on how to return the
> >>>     synthesized audio data (if at all).  The main issue here is
> >>>     mostly with how to align marker and other time-related events
> >>>     with the audio  being played on the audio output device.
> >>>
> >>>  Proposal: There will be 2 possible ways to do it. The synthesized
> >>>     data can be returned to the application (case A) or the
> >>>     application can ask for them being played on the audio (which
> >>>     will not be the task of TTS API, but will be handled by
> >>>     another API) (case B).
> >>>
> >>>     In (case A), each time the application gets a piece of audio
> >>>     data, it also gets a time-table of index marks and events
> >>>     in that piece of data. This will be done on a separate socket
> >>>     in asynchronous mode. (This is possible for software
> >>>     synthesizers only, however.)
> >>>
> >>>     In (case B), the application will get asynchronous callbacks
> >>>     (they might be realized by sending a defined string over
> >>>     a socket, by calling a callback function or in some other
> >>>     way -- the particular way of doing it is considered an
> >>>     implementation detail).
> >>>
> >>>     Rationale: Both approaches are useful in different situations
> >>>     and each of them provides some capability that the other one
> >>>     doesn't.
> >>>
> >>>  Open Issue: Will the interaction with the driver be synchronous
> >>>     or asynchronous?  For example, will a call to `speak'
> >>>     wait to return until all the audio has been processed?  If
> >>>     not, what happens when a call to "speak" is made while the
> >>>     synthesizer is still processing a prior call to "speak?"
> >>>
> >>>  Proposal: With the exception of events and index marks signalling,
> >>>     the communication will be synchronous. When a speak request
> >>>     is issued while the is still processing a prior call to speak
> >>>     and the application didn't call pause before, this is
> >>>     considered an error.
> >>>
> >>> E. Related Specifications
> >>>
> >>>    SSML: http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/
> >>>          (see requirements at the following URL:
> >>>
> >>> http://www.w3.org/TR/2004/REC-speech-synthesis-20040907/#ref-reqs)
> >>>     
> >>>    SSML 'say-as' element attribute values:
> >>>       http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/
> >>>
> >>>    MRCP: http://www.ietf.org/html.charters/speechsc-charter.html
> >>>
> >>> F. Copying This Document
> >>>
> >>>  Copyright (C) 2006 ...
> >>>  This specification is made available under a BSD-style license ...
> >>>
> >>> _______________________________________________
> >>> accessibility mailing list
> >>> accessibility at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/accessibility
> >>>   
> >>
> >>  
> >>
> >>> _______________________________________________
> >>> Accessibility mailing list
> >>> Accessibility at lists.freestandards.org
> >>> http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility
> >>>   
> >>
> >>
> >>  
> >>
> > _______________________________________________
> > Accessibility mailing list
> > Accessibility at lists.freestandards.org
> > http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility
> 
> _______________________________________________
> Accessibility mailing list
> Accessibility at lists.freestandards.org
> http://lists.freestandards.org/cgi-bin/mailman/listinfo/accessibility




More information about the Accessibility mailing list