Using HTML5 and Javascript to Deliver Text-Based Audio Descriptions

IBM Research-Tokyo recently partnered with NCAM to research ways to deliver online audio descriptions via text-to-speech (TTS) methods, rather than using human recordings. IBM and NCAM explored two approaches which exploit new HTML5 media elements– <video>, <audio> and <track>– as well as Javascript and TTML:

  1. Writing and time-stamping a description script, then delivering the descriptions as hidden text in real time in such a way that a user’s screen reader will read them aloud. The descriptions remain otherwise invisible and inaudible to non-screen-reader users.
  2. Writing and time-stamping descriptions, then recording them using TTS technology. At the time of playback, each description is individually retrieved and played aloud at intervals corresponding to the time-stamped script.

Read a full description of the project, see the demonstration files and, if you want, download the code to see how it all works; then come back and let us know what you think.


One thought on "Using HTML5 and Javascript to Deliver Text-Based Audio Descriptions

  1. Two troubleshooting tips if you’re using VoiceOver:

    1. If VoiceOver reads the TTS descriptions more than once, shut down the screen reader and restart it (you should not need to quit the browser or reload the page).
    2. If you find that you can’t use the keyboard shortcuts for the players while VoiceOver is running, first press Control+Option+Tab to activate VoiceOver’s pass-through command, then press the keyboard shortcut that you want to use (e.g., Control+1 to play/pause the TTS player).

