[Accessibility] Unified Use Cases for Expert Handlers,
version 1.0 (UUC1) [plain text version]
Gregory J. Rosmaita
unagi69 at concentric.net
Tue Feb 5 19:17:01 PST 2008
_________________________________________________________________
Unified Use Cases for Expert Handlers, version 1.0 (UUC1)
_________________________________________________________________
revision date: 2008-02-04
document status: public working draft
current version:
http://accessibility.linux-foundation.org/a11yspecs/handlers/uuc1.html
previous versions: Internal UUC Editor's Drafts
Authors: Pete Brunet, Vladmir Bulatov, Gregory J. Rosmaita, Janina
Sajka and Neil Soiffer (chair, Expert Handlers SIG)
Edited and annotated by: Gregory J. Rosmaita
_________________________________________________________________
Table of Contents for UUC1
1. Introduction: What Are Expert Handlers?
2. Speech Output Use Cases
3. Alternative Input Use Cases
4. Navigability Use Cases
5. Magnification Use Cases
6. Braille Display, Embossing and Tactile Conversion Use Cases
7. Universal Use Cases
7.1 Universal Use Case 1: Where Am I?
7.2 Universal Use Case 2: Document Summary
8. Putting It All Together: Expert Handlers & the Flow of Control
9. Footnotes
_________________________________________________________________
Please provide feedback on this draft to the publicly archived Open
Accessibility Request for Comments emailing list
accessibility-rfc at a11y.org. Posted comments will be appended by the
editor to the Scratch Pad for the Unified Use Cases which serves as
a collection point for issues and ideas related to expert handlers
and its possible implementations.
_________________________________________________________________
Introduction: What Are Expert Handlers?
The purpose and responsibility of accessibility interfaces, such as
Microsoft Active Accessibility (MSAA) and IAccessible2 (IA2), is to
provide assistive technology (AT) with the ability to access and
interact with the information contained in an application. This allows
an AT to access the information in the application's DOM.
Interpreting, displaying, and navigating the information is the
responsibility of the AT.
The success of the web and the increasing use of XML in documents has
lead AT to develop support for these markup based applications.
Therefore, we must distinguish between two classes of markup in order
to explain the need for expert handler technology. The type of markup
most often used on the web is HTML 4.01/XHTML 1.0. HTML is an example
of generalized markup that is well handled by AT and is not addressed
in this document further. However, it is significant to note that even
generalized markup must sometimes be complimented by markup
specifications, such as ARIA, that facilitate more semantically
precise content handling where none is present.
Generalized content markup is complimented by markup specifications
that facilitate more semantically precise content markup. Examples of
specialized, semantically precise markup include MathML and MusicXML.
In order for users of AT to access specialized markup effectively, AT
needs guidance to communicate the content of the specialized markup
language to the user.
The Expert Handlers SIG of the Open Accessibility (A11y) Workgroup at
The Linux Foundation, is exploring a standardized plug-in mechanism to
AT software. The goal of this plug-in standard is to allow >AT
software to take advantage of expert software that understands
specialized markup. This plug-in standard will allow the expert
software to provide enhanced, semantically rich access to specialized
markup, so that the AT can properly render the markup visually,
aurally, and/or tactilely. The plug-in would also help users navigate
the semantic meaning encoded in the specialized markup.
To provide some background as to what needs to be supported by an
expert handler interface standard, the following sections discuss a
number of use cases for an expert handler. The uses cases are divided
into various functionalities such as speech output, alternative input,
navigation, and braille generation. The last section discusses the
options for how an expert handler might fit into the sequence of
events that eventually results in a response to a user action.
_________________________________________________________________
Speech Output Use Cases for Expert Handlers
Computer users who are blind or severely visually impaired often use
assistive technology (AT) built around synthetic text to speech (TTS).
These AT applications are commonly called screen readers. Screen
reader users listen to a synthetic voice rendering of on screen
content because they are physically unable to see this content on a
computer display monitor.
Because synthetic voice rendering is intrinsically temporal, whereas
screen displays are -- or can easily be made -- static, various
strategies are provided by screen readers to allow users to tightly
control the alternative TTS rendering. Screen reader users often find
it useful, for instance, to skim through content until a particular
portion is located and then examine that portion in a more controlled
manner, perhaps word by word or even character by rendered character.
It is almost never useful to wait for a synthetic voice rendering that
begins at the upper left of the screen and proceeds left to right, row
by row, until it reaches the bottom because such a procedure is
temporally inefficient, requiring the user to strain to hear just the
portion desired in the midst of unsought content. Thus, screen readers
provide mechanisms that allow the user to focus anywhere in the
content and examine only that content which is of interest.
Screen readers have proven highly effective at providing their users
access to content which is intrinsically textual and linear in nature.
It is not hard to provide mechanisms to focus synthetic voice
rendering paragraph by paragraph, sentence by sentence, word by word,
or character by character.
Access to on screen widgets have also proven effective by rendering
that static content in list form, where the user can pick from a menu
of options using up and down arrow plus the enter key to indicate a
selection, in lieu of picking an icon on screen using a mouse.
Access to content arrayed in a table can also succeed by allowing the
AT to simulate the process a sighted user employs to consider tables.
In other words, mechanisms are provided to hear the contents of a cell
and also the row and column labels for that cell (which define the
cell's meaning).
Similar smart content rendering and navigation strategies are required
by screen reader users in more complex, nonlinear content such as
mathematical (chemical, biological, etc.) expressions, music, and
graphical renderings. Because such content is generally the province
of knowledge domain experts and students, and not the domain of most
computer users, screen readers do not invest the significant resources
necessary to serve only a small portion of their customer base with
specialized routines for such content. Furthermore, the general
rendering and navigation strategies provided for linear (textual),
menu, and tabular content are woefully insufficient to allow users to
examine specific portions of such domain specific expressions
effectively. On the other hand domain specific markup often does
provide sufficient specificity so that the focus and rendering needs
of the screen reader can be well supported.
In order to gain effective access to such domain specific content
screen reader users require technology that can:
* Synthetically voice the expression in a logical order
* Allow the user to focus on particular, logical portions of
expressions possibly at several layers of granularity
* Appropriately voice specialized symbols and symbolic expressions
_________________________________________________________________
Alternative Input Use Cases
There are users with disabilities who do not require accomodation in
order to read domain specific markup. Rather, these users require
assistive technologies to facilitate their scrolling and/or editing of
content. Highly effective assistive technologies exist to accomodate
alternative input strategies ranging from:
* speech input technology, such as DragonDictate or
NaturallySpeaking;
* mouse and keyboard alternative systems, such as the GNOME
On-Screen Keyboard (GOK), Jambu, which provides improved web
accessibility for switch and alternative pointer users, and
OpenEyes, an open-source open-hardware toolkit for real-time
eye-tracking;
* context-aware word-prediction technologies, such as Dasher
Users of alternative input assistive technologies require two specific
accomodations for scrolling and editing domain specific content:
1. Context aware expedited scrolling and navigation. The Navigation
Use Cases outlined in this document will serve this requirement.
2. Knowledge domain context aware command and content vocabulary for
speech based navigation systems and for word-prediction systems.
_________________________________________________________________
Navigability Use Cases
AT users need to be able to navigate within sub-components of
documents containing specialized content, such as math, music or
chemical markup. Typically these specialized components have content
which needs to receive focus at different levels of granularity, e.g.
a numerator within a numerator, an expression, a term, a bar of music,
etc.
Within each level, functions are needed in response to AT commands to
inspect and navigate to and from "items" (e.g., by word, bar,
expression, clause, term, etc., depending upon the type of content
being expressed) for a particular level of granularity:
1. contextual query/inspection of object/glyph with current focus
("Where Am I?")
2. character-by-character
3. previous/current/next item
4. all items with user-defined characteristics
5. all items in a author-defined category ^[footnote 1]
6. first/last item on a line
7. first/last item within next higher or lower level of granularity
8. first/last item in the document
There are two scenarios to consider, a read-only scenario and a
scenario where the user is editing the document.
There are three system components that need to interact: the user
agent, e.g. a browser, the AT, and the expert handler.
In the read-only case, the AT responds to some sort of Point of Regard
change event and depending on the role of the object which received
focus, the AT fetches accessibility information pertinent to that role
and then formats/outputs a response tailored to an AT user, e.g.
TTS/braille. In the case of specialized content, an expert handler
needs to be used by the AT because the AT doesn't know how to deal
with such specialized content directly.
In order to meaningfully interact with the specialized content, the
user needs to be able to execute the following actions:
* change level of granularity up/down
* read all from top
* read all from Point of Regard (POR)
* goto and read first/last item on the current line
* goto and read first/last item within the next less/more granular
item
* goto and read first/last item in the document
* goto and read previous/current/next item
In the case of editable content there may also be a desire to have
separate cursors, e.g. one to remain at the POR (the caret, if
editing), and one to move around for review purposes.
The AT will already have UI input commands for most of the above
functions, but probably not for changing to higher/lower levels of
granularity. If the AT needs to provide the user with an increased
level of granularity, in response, the AT would call the handler to
change the mode of granularity. The AT will handle the UI commands and
in turn call the handler to return an item at the current level of
granularity. The AT would have told the handler about the output mode,
e.g. braille or TTS. Armed with those three things: level of
granularity, mode of output, and which item (first, last, previous,
current, next), the handler knows what to do.
In the case of editable content, the UA provides the input UI for the
user. This editing capability would most likely be provided via a
plugin. Specific accessibilities features needed for editing
specialized markup have yet to be explored.
_________________________________________________________________
Magnification Use Cases
A common use of magnification is to proportionately enlarge content.
For text-based (or more generally, font-based) applications, this
means that AT software should be able to request rendering with larger
sized fonts or a certain amount of magnification relative to some
baseline magnification. Applications beyond standard text-based ones
include math, music, and labeled plots/graphics. For non text-based
applications such as graphics and chemical structures, magnification
could be based on a certain percentage of the normal size or given by
"fill this area". These two ideas can always be mapped onto each
other. In all of these cases, the magnification may be due to having
the entire documented magnified or it may be due to a request to
magnify an individual instance (such as an equation). There are two
other uses for magnification:
1. While navigating or speaking, it might be desirable to magnify the
part being navigated/spoken to make it easier to see. For example,
while playing some music, the current measure and next measure
might be magnified to ease reading while leaving the rest
unmagnified so that the amount of screen space used is minimized.
There also needs to be a method to reset the magnification.
2. Math and Chemical notation shrink fonts for superscripts and
subscripts. In math, these are further reduced for nested scripts.
One common feature for math renderers is to set a minimum font
size. Typically, this is 50% of the base font size and corresponds
to the size used for doubly nested scripts. It is potentially
useful to allow the AT to control the maximum percent shrinkage
used by renderers. Another possibility is to have a feature that
says "don't shrink at all". Although the rendering would not be
consider high quality typesetting, it does make scripts more
readable to those with some vision impairment.
_________________________________________________________________
Braille Display, Embossing and Tactile Conversion Use Cases
An expert handler should be able to provide braille data for braille
display output by generic AT. Custom braille output is needed, because
generic AT has no knowledge about how specific specialized data can
and should be represented via braille. An example is mathematics:
there are many different braille codes used to represent mathematics
that vary from country to country and agency to agency.
Simple ASCII strings are normally used to communicate braille to
braille devices. However, there are a lot of specific ASCII-to-dots
pattern-encoding tables used to generate braille that conform to a
natural language's braille conventions. Therefore AT and the expert
handler have to negotiate the most appropriate braille table to be
used. A more universal approach would be to use the special braille
Unicode symbols which range from 0x2800 to 0x28FF.
There is also a need to have braille output tailored to various levels
of granularity. For example, at a low level of granularity, the user
would receive an overall description of the mathematical expression or
image, while at the highest level of granularity, the user would
receive a complete braille translation of the whole math expression or
a list of all labeled components of the image.
Some data may need to be expressed in a more advanced tactile output
format than refreshable braille. For example, graphical data would
greatly benefit from being embossed on paper or a 2D braille display.
Input devices, such as a touchpad or camera, which allow a user to
communicate to the computer which parts of the graphic the user is
interested in and needs to be tactilely displayed. Such interactive
functionality should be left exclusively to the expert handler. This
means that an expert handler must have an interactive mode and a way
for an AT to trigger/toggle this mode on. In such a mode, an AT should
also provide a way for the expert handler to produce more than one
output stream -- such as simultaneous speech and braille output --
directly via an AT device which uses the same TTS engine and/or
braille display.
_________________________________________________________________
Universal Use Cases
Universal Use Case 1: Where Am I?
The user must have a means of obtaining all available information
about the object/character with focus, beginning with the repetition
of the character or the programmatic binding which describes the
object with focus. The ability to query the AT to determine one's
point of regard within a document and within containers in the
document is essential. The user must be able to obtain information
about the current point of regard at from most generic level -- what
percentage of the document or section has been read, how much of the
document or section remains to be read -- to the most atomic.
Therefore, an AT must create a User Interface where successive "Where
Am I?" queries by the user generate more verbose or more terse
responses. ^[footnote 2]
Universal Use Case 2: Document Summary
A user may find it necessary to consult a Document Summary, containing
a list of the types of elements and containers in the document. The
user needs to know the document title and language as well as the
number of tables, links, headings, frames, forms, controls, items,
images, and pages. The application may implement a document summary
feature natively through its own UI instead of an accessibility API,
but in the case of specialized markup, may need the assistance of an
expert handler in order to present an appropriate document summary for
the content being summarized.
_________________________________________________________________
Putting It All Together: Expert Handlers and the Flow of Control
The goal of the Expert Handlers working group is to define a standard
so that AT software can call on expert software to interpret
specialized markup. One issue that needs to be addressed is how and
where (in the flow of control of reading a page) should the expert
handler get invoked. Here are three possibilities:
1. During installation, the expert handler registers itself with the
rendering application (e.g., the web browser, PDF viewer, etc.).
When the page is loaded, the handler is invoked by the renderer to
convert the DOM or some proxy for the DOM node to make it appear
to have non-expert content. For example, it might convert the
specialized markup to text or some generalized markup that AT can
typically handle.
2. The AT traverses the DOM and when it gets to some node it doesn't
understand, it consults some resource that associates a particular
handler with the node name. It gets the node's content (which
might include other nodes) from the DOM and passes that content to
the expert handler. It then issues requests to the handler (e.g.,
"give me text to speak for the content").
3. The AT traverses the DOM and when it gets to some node it doesn't
understand, it consults some resource that associates a particular
handler with the node name. It then points the expert handler to
that node and issues requests to the handler. In this case, the
handler is directly interacting with the DOM.
Although similar, the later two cases probably have implications on
the difficulty of implementation and the capabilities of the
interface. Some of these are:
* If the expert handler directly reads the DOM, then the expert
handler must understand MSAA, IA2, or whatever is appropriate for
the level of functionality it needs. This also implies that the
rendering application must support those standards. If not, the
expert handler would need to know how to access application
specific DOMs.
* If the expert handler is given a copy of what resides in the DOM,
then interacting with the content (e.g., filling in a text field)
would complicate any standard that is developed because support
for passing info about input would need to be part of the
standard.
_________________________________________________________________
Footnotes
Note 1. for example, the FIELDSET, LEGEND, LABEL grouping and
labelling mechanisms for FORM controls or the headers/id relationship
defined for TABLE in HTML 4.01/XHTML 1.0 or the ARIA markup
"labelledby" and "describedby"
Note 2. For Specialized MarkUp Languages, the following list of points
of regard needs to be broadened and abstracted into a context
meaningful to the content and structure achieved through the use of a
particular specialized markup language. For example, a musical score
marked up in an XML-derived dialect, would frame its points of
reference in a manner conformant with the structure of the content
being accessed: by movement, by measure, by stanza, by bar, by note,
and so on. The level of granularity necessary to provide meaningful
interaction between the user of an AT and a specific markup language
is highly dependent upon the type of specialized content being
described, as well as the parameters and structures inherent to the
specialized knowledge domain for which the specialized markup language
has been designed.
For each potential point of regard possible in a specific Generalized
Markup Language, the AT requires, and can usually obtain from the
document's structure and semantics, as reflected in the DOM, the
following element characteristics, if they exist, depending on the
type of elements in the item at the current POR:
* For all locations:
1. Number of items in the document
2. Relative item number (n of total) within the document
3. Document title
* Table information if in a table:
1. Caption and table summary
2. Content for row and column headers
3. Relative number (n of total number) for the table in the
document
4. Relative row and column number (x of total, y of total)
within parent table
5. Table type/role (data, spreadsheet, calendar)
* Section information if in a section:
1. Section type (page, frame, heading)
2. Section title
3. Relative number (n of total) for the section type in the
document
4. Level if in a section with a heading
5. Relative item number (n of total) within the section
* Form control information if on a form control:
1. Group label for a control (such as LEGEND or OPTGROUP in
HTML) if in a group
2. Label or alternative text (such as title or alt in HTML or
title and desc in SVG)
3. Type of form control (role)
4. State
5. Relative number (n of total) of the parent form in the
document
6. Relative form control number (n of total) within the parent
form
* Map information if in a map:
1. Relative area number (n of total) within the areas of a map
2. Title attribute for map
* List or menu information if within a menu or list:
1. Type (role) - menu, simple list, definition list, ordered
list, folder, navigation bar, and so on
2. Title from parent menu or list
3. Relative number (n of total) of parent list or menu in the
document
4. Relative list item number (n of total) within the list
* Link information if on a link:
1. Relative link number (n of total) within the document
2. Link state: visited, unvisited, active, focused, external or
internal
3. Extended information, such as that provided by the title
attribute
_________________________________________________________________
Please provide feedback on this draft to the publicly archived Open
Accessibility Request for Comments emailing list
accessibility-rfc at a11y.org. Posted comments will be appended by the
editor to the Scratch Pad for the Unified Use Cases which serves as
a collection point for issues and ideas related to expert handlers
and its possible implementations.
_________________________________________________________________
source:
http://accessibility.linux-foundation.org/a11yspecs/handlers/uuc1.html
More information about the Accessibility
mailing list