[Accessibility-ia2] media a11y
pete at a11ysoft.com
Tue Jun 21 19:04:40 PDT 2011
On 6/21/2011 9:47 AM, Janina Sajka wrote:
> Hi, All:
> Three brief comments ...
> 1.) I'm inclined to suggest that we should move this discussion to the main
> a11y list, as we want both our APIs to support all the new content
> coming with HTML 5, ATK/AT-SPI as well as IAccessible2.
My concern is that there are several developers especially AT developers
who are on this list but note on the main list.
> 2.) There's a "User Requirements" document that we created in our
> W3C work on the accessibility of HTML 5 media that people should know
> about. If I may be so bold, you may want to bookmark:
> We intend this document as a introduction to the full
> range of user requirements for people of all kinds of
> disabilities. I think we're pretty close to covering
> that landscape, and we will try to add to this document
> as remaining issues are clarified. It is indended this
> document will become a non-normative W3C publication,
> probably as a "W3C Note" published by the Protocols and
> Formats Working Group (PF) of the W3C's Web Accessibility
> Initiative (WAI).
This is a very good document.
There is a sentence that seems at odds with something Sylvia said, i.e.
"The current solution are audio descriptions and they are much harder to
produce than text descriptions." The document says, "The technology
needed to deliver and render basic video descriptions is in fact
relatively straightforward, being an extension of common
But, none the less, I can see some advantages to using VTD (video text
- no need to find (and pay) a talented (pleasing to listen to) speaker
- no need to find a speaker whose voice is a good match for the audio
track (easily distinguishable from the other speakers)
- ability for screen reader user to adjust the playback speed, pitch,
In the section on extended video it says, "Extended descriptions work by
pausing the video and program audio at key moments, playing a longer
description than would normally be permitted, and then resuming playback
when the description is finished playing." There must have been some
thought about how this would be done, i.e. what mechanisms are proposed
for this? The AT user could use a context menu using standard GUI
accessibility or failing that the AT could provide access via
IAccessibleAction (or ATK's equivalent) on whatever control will be
provided for this. (This same issue is covered in Enhanced
Captions/Subtitles, especially requirements ECC-3 and ECC-5.)
That document points to this blog entry:
where it says, "...subtitles attached to the video can be sent to an
online translation tool and converted to whatever language you want on
understand the syncing mechanism.
> 3.) Let's be sure to think in terms of rich text handling. Our media
> work at the W3C has forced us to recognize that the text we will be
> passing to a11y APIs will sometimes contain markup, and we'd like to see
> assistive technologies dealing with the markup appropriately. We're
> still working on how best to clarify this in the ARIA support
> documentation that is being produced by PF, but it's not too soon to put
> this consideration on the table here.
I think the UA rather than the AT should provide a rendering of the
marked up text. That rendering would be a simple text string plus text
attributes. Please see the IA2 text attributes at:
This would include support for portions of text that are in different
> Do I mean that the text strings in Silvia's example below might
> be rich text, perhaps including ML? We haven't specifically
> discussed that in our Media work with respect to extended text
> video descriptions, but I suspect this will come up. After all,
> why should we dumb-down our tools? Much better to use available
> mechanisms to best support users going forward. Extended text
> video descriptions are most likely to be used in educational
> settings, so there's significant reason to expect markup would
> enhance that communication.
> One specific example that's a favorite of mine is foreign
> language support. We have the ability to tag in line language
> changes, so that a phrase of the day might be properly tagged
> as: "today's slogan <lang=fr>de jeur</lang>. It's time our AT
> was able to pronounce such things using appropriate rules. We've
> not done a good job of that in the past. I suspect it's not
> unreasonable to think we can now do much better, but we need to
> think through the solution end to end.
> Silvia Pfeiffer writes:
>> On 21/06/2011, at 2:51 PM, Pete Brunet <pete at a11ysoft.com> wrote:
>>> On 6/20/2011 10:20 PM, Silvia Pfeiffer wrote:
>>>> Hi Pete,
>>>> Before I address any of this, I think there is a confusion still and
>>>> I'd like to make this very clear so we don't talk past each other: The
>>>> "descriptions" that Alex and I have been talking about for the
>>>> purposes of this thread are not in audio format. Instead, they are
>>>> text and provided to the browser in exactly the same way as captions.
>>> Thanks Sylvia, Yes I understood that.
>>>> Here is an example of such a description file:
>>>> . It has cues like the following:
>>>> 00:00:00,000 --> 00:00:05,000
>>>> The orange open movie project presents
>>>> 00:00:05,010 --> 00:00:12,000
>>>> Introductory titles are showing on the background of a water pool with
>>>> fishes swimming and mechanical objects lying on a stone floor.
>>>> 00:00:12,010 --> 00:00:14,800
>>>> elephants dream
>>>> 00:00:26,100 --> 00:00:28,206
>>>> Two people stand on a small bridge.
>>>> They aren't actually useful unless voiced in parallel to the video
>>>> that is playing and rendered during the time that the cue is
>>>> On Tue, Jun 21, 2011 at 4:52 AM, Pete Brunet <pete at a11ysoft.com> wrote:
>>>>> Hi Sylvia, We probably have more to learn from you than you from us :-)
>>>> Well, we have to work together to solve this problem. :-)
>>>>> I think even in the case of HTML5 nothing has changed for those who are
>>>>> either deaf or blind but not both, i.e. an additional mode can be used to
>>>>> compensate for the sense that is impaired, e.g. captions for the deaf and
>>>>> audio descriptions for the blind. However, in the case of those who are
>>>>> deaf/blind then a tactile mode is needed. One solution is to make captions
>>>>> available to the screen reader (with its Braille support) and the audio
>>>>> descriptions available as text to the screen reader.
>>>>> Are there other scenarios besides use by those who are deaf/blind where text
>>>>> descriptions are needed?
>>>> In the HTML spec, there is mention of hands-free applications that
>>>> could make use of it, too. But I suspect that would also require use
>>>> of a screen reader type additional application.
>>> I think hands-free implies the need for voice recognition so I'm not seeing that as scenario that would require text descriptions.
>> This is hands-free for watching the video without looking and without user interaction, just plain playback. Thus, it can be regarded as an identical situation to being blind. But I don't want to get hung up on this because we don't really care about such a need on this list. So, let's just focus on blind users.
>>>>> If the only need for text descriptions is to provide access for those who
>>>>> are deaf/blind, what are the current solutions? Transcripts?
>>>> For deaf-blind people I believe transcripts are the solution and they
>>>> are the solution still, even though a voicing of both, captions and
>>>> text descriptions will provide a solution for deaf-blind people, too.
>>>> I regard that as a minor use case though.
>>>>> Are the
>>>>> existing solutions insufficient enough to justify the engineering effort
>>>>> associated with text descriptions and stream control?
>>>> Text descriptions are for blind people in general, not just for
>>>> deaf-blind people.
>>>> The current solution are audio descriptions and they are much harder
>>>> to produce than text descriptions. So, in the interest of gaining more
>>> Thanks. That's good information and an important justification. Now that I see the usefulness in supporting text descriptions I'll review the proposal and the issues that have been raised so far and post another response tomorrow.
>> Thanks very much indeed! Apologies for not explaining earlier.
>>>> accessibility to video content, text descriptions were created to help
>>>> achieve that. Both mechanisms: audio descriptions and text
>>>> descriptions, are supported in HTML5.
>>>>> From the business
>>>>> perspectives that development managers are bound by the cost of design and
>>>>> implementation would not be justifiable for such a small user base, unless
>>>>> there is a legal requirement. Is there (or will there soon be) a legal
>>>>> requirement to provide text descriptions?
>>>> I assume you are talking about development managers of accessibility
>>>> software? I believe the implementation of such a feature is indeed a
>>>> business decision and screen readers are free to compete on the
>>>> grounds of one having more a11y features than another. However, we are
>>>> here only indirectly talking about software - we are instead talking
>>>> about a general, standardised means of making a HTML5 feature
>>>> available to AT. I believe that this standardisation effort is well
>>>> worth the effort so we don't get different screen readers implementing
>>>> support for audio descriptions in a different manner when they do
>>>> decide to implement it.
>>>>> Regarding the descriptions keyword of the kind attribute, the document says
>>>>> that it's meant for use when visual capability is unavailable and gives the
>>>>> example of driving or blind users and also mentions that the text is meant
>>>>> for synthesis. However, for those who are blind (and not deaf/blind) audio
>>>>> can still be heard and thus there is no need for a text version of an audio
>>>>> description. And for deaf blind users synthesis is not needed - tactile
>>>>> output (Braille) is needed.
>>>> I hope my above description explains why text descriptions are
>>>> different to audio descriptions and that support for both is required.
>>>> Audio descriptions will indeed already work with the current
>>>> specification of HTML5. But we want to make the simpler authoring task
>>>> of creating text descriptions a more effective means of delivering
>>>> accessibility to videos for blind users. Note that when there are text
>>>> descriptions available, we would *not* expect there to also be audio
>>>> descriptions available.
>>>>>>> At least at this point I'm not in favor of the media control methods.
>>>>>>> Developers should provide accessible GUI controls. The developer would have
>>>>>>> to implement the access in any case and having access through the GUI would
>>>>>>> eliminate adding the code for these new methods on both sides of the
>>>>>>> interface. If the app developer does a correct implementation of the GUI
>>>>>>> there would be no extra coding required in ATs.
>>>>>> I guess the idea here was that there may be situations where AT needs
>>>>>> to overrule what is happening in the UI, for example when there are
>>>>>> audio and video resources that start autoplaying on a newly opened
>>>>>> page. However, I am not quite clear on this point either.
>>>>> I believe the AT user would be in the same situation as a non-AT user, i.e.
>>>>> all users would use the same means to stop autoplaying (if such means were
>>>> That is probably true: a browser setting to generally disallow
>>>> autoplaying or a shortcut key in the browser to stop any and all media
>>>> elements that are autoplaying would be a nice browser feature for any
>>>> Just to clarify: I cannot explain why we need the API in 2.7.1
>>>> https://wiki.mozilla.org/Accessibility/IA2_1.3#Control_video.2Faudio .
>>>> I do think, however, that we need the interface in 2.7.2
>>>> https://wiki.mozilla.org/Accessibility/IA2_1.3#Text_cues .
>>>> Note that I created the second interface in that section, because I
>>>> believe that AT needs to know the start time, end time, and exact text
>>>> of the to-be-read cue. I included "id" so we can keep an identifier,
>>>> but that may not be necessary. Also, I included both a function to
>>>> grab the HTML version as well as the plaint text version of the cue so
>>>> we have the ability to render markup differently, such as "em" can
>>>> create emphasis in the voicing, or navigation markers can be used to
>>>> jump over earlier details in the cue to later ones. I am just
>>>> guessing, though, what kinds of information may be useful for the
>>>> screenreader to receive.
>>>> Also, note the need to listen to the "cuechange" event on the video's
>>>> description track and for access to setting/unsetting the
>>>> "pauseOnExit" IDL attribute of the cue from the screenreader.
>>>> I hope I've been able to clarify a few things...
>>> Pete Brunet
>>> a11ysoft - Accessibility Architecture and Development
>>> (512) 689-4155 (cell)
>>> Skype: pete.brunet
>>> IM: ptbrunet (AOL, Google), ptbrunet at live.com (MSN)
>>> Ionosphere: WS4G
>>> Accessibility-ia2 mailing list
>>> Accessibility-ia2 at lists.linuxfoundation.org
>> Accessibility-ia2 mailing list
>> Accessibility-ia2 at lists.linuxfoundation.org
a11ysoft - Accessibility Architecture and Development
(512) 689-4155 (cell)
IM: ptbrunet (AOL, Google), ptbrunet at live.com (MSN)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Accessibility-ia2