Learning Through Multimedia: Speech Recognition Enhancing Accessibility and Interaction
Mike Wald, University of Southampton, United Kingdom
Journal of Educational Multimedia and Hypermedia Volume 17, Number 2, ISSN 1055-8896 Publisher: Association for the Advancement of Computing in Education (AACE), Waynesville, NC USA
Invited as a Paper From ED-MEDIA 2006
Lectures can present barriers to learning for many students and although online multimedia materials have become technically easier to create and offer many benefits for learning and teaching, they can be difficult to access, manage, and exploit. This article considers how research on interacting with multimedia can inform developments in using automatically synchronised speech recognition (SR) captioning to:
1.facilitate the manipulation of digital multimedia resources;
2.support preferred learning and teaching styles and assist those who, for cognitive physical or sensory reasons, find note-taking difficult; and
3.caption speech for deaf learners or for any learner when speech is not available or suitable.
Wald, M. (2008). Learning Through Multimedia: Speech Recognition Enhancing Accessibility and Interaction. Journal of Educational Multimedia and Hypermedia, 17(2), 215-233. Waynesville, NC USA: Association for the Advancement of Computing in Education (AACE). Retrieved March 26, 2019 from https://www.learntechlib.org/primary/p/23588/.
© 2008 Association for the Advancement of Computing in Education (AACE)
- Abowd, G. (1999). Classroom 2000: An experiment with the instrumentation of a living educational environment. IBM Systems Journal, 38(4), 508–530.
- Arons, B. (1991, December). Hyperspeech: Navigating in speech-only hypermedia. Proceedings of the 3rd Annual ACM Conference on Hypertext (pp. 133rd
- Arons, B. (1997). SpeechSkimmer: A system for interactively skimming recorded speech. 1997 ACM Transactions on Computer-Human Interaction (TOCHI), 4(1), 3-38.
- Baecker, R. M., Wolf, P., & Rankin, K. (2004, October). The ePresence interactive webcasting system: Technology overview and current research issues. Proceedings of the World Conference on E-Learning in Government, Healthcare, & Higher Education 2004 (pp. 2396-3069), Washington, DC. Chesapeake, VA: Association for the Advancement of Computing in Education.
- Bailey, B. (2000). Human interaction speeds. Retrieved December 8, 2005, from http://webusability.com/article_human_interaction_speeds_9_2000.htm
- Bailey, B. (2002). Readability formulas and writing for the web. Retrieved December 8, 2005, from
- Bain, K., Basson, S. A., Faisman, A., & Kanevsky, D. (2005). Accessibility, transcription, and access everywhere, IBM Systems Journal, 44(3), 589-603. Retrieved December 12, 2005, from
- Barbier, M. L., & Piolat, A. (2005, September). L1 and L2 cognitive effort of notetaking and writing. Paper presented at the SIG Writing Conference 2004, Geneva, Switzerland.
- Boyarski, D., Neuwirth, C., Forlizzi, J., & Regli, S. H. (1998, April). A study of fonts designed for screen display. Proceedings of CHI’98 (pp. 87-94), Los Angeles, CA.
- Carrol, J., & McLaughlin, K. (2005). Closed captioning in distance education. Journal of Computing Sciences in Colleges, 20(4), 183-189.
- Carver, C. A., Howard, R. A., & Lane, W. D. (1999). Enhancing student learning through hypermedia courseware and incorporation of student learning styles. IEEE Transactions on Education, 42(1), 33-38.
- Chiu, P., Kapuskar, A., Reitmeief, S., & Wilcox, L. (1999, October/November). NoteLook: Taking notes in meetings with digital video and ink. Proceedings of the Seventh ACM International Conference on Multimedia (Part 1, pp. 149 – 158), Orlando, FL.
- Chiu, P., & Wilcox, L. (1998, November). A dynamic grouping technique for ink and audio notes. Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology (pp. 195-202), San Francisco, CA.
- Clements, M., Robertson, S., & Miller, M. S. (2002). Phonetic searching applied to on-line distance learning modules. Retrieved December 8, 2005, from http://www.imtc.gatech.edu/news/multimedia/spe2002_paper.pdf
- Dolphin (2005). Dolphin Publisher. [Computer software]. Retrieved December 8, 2005, from http://www.dolphinaudiopublishing.com/
- Faraday, P., & Sutcliffe, A. (1997, March). Designing effective multimedia presentations. Proceedings of CHI ‘97 (pp. 272-278), Atlanta, GA. Proceedings of CHI ‘97 (pp. 272-278), Atlanta, GA. Proceedings of CHI ‘97
- Hede, T., & Hede, A. (2002, July). Multimedia effects on learning: Design implications of an integrated model. In S. McNamara & E. Stacey (Eds), Untangling the web: Establishing learning links: Proceedings of ASET Conference
- Hindus, D., & Schmandt, C. (1992, November). Ubiquitous audio: Capturing spontaneous collaboration. Proceedings of the ACM Conference on Computer Supported Co-Operative Work (pp. 210-217), Toronto, ON, Canada.
- Holleran, P., & Bauersfeld, K. (1993, April). Vertical spacing of computer-presented text. CHI ‘93 Conference on Human Factors in Computing Systems (pp. 179-180), Amsterdam, The Netherlands.
- Hornbæk, K., & Frøkjær, E. (2003). Reading patterns and usability in visualizations of electronic documents. ACM Transactions on Computer-Human Interaction (TOCHI), 10(2), 119-149.
- Howard-Spink, S. (2005). You just don’t understand! Retrieved January 30, 2008, from http://domino.research.ibm.com/comm/wwwr_thinkresearch. Nsf/pages/20020918_speech.html
- IBM (2003). The superhuman speech recognition project. Retrieved December 12, 2005, from
- IBM (2005). ViaScribe. Retrieved December 8, 2005, from
- Keegan, V. (2005). Read your mobile like an open book. The Guardian. Retrieved December 8, 2005, from
- Klemmer, S. R., Graham, J., Wolff, G. J., & Landay, J. A. (2003, April). Books with voices: Paper transcripts as a physical interface to oral histories. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 89-96), Fort Lauderdale, FL.
- Laarni, J. (2002, October) Searching for optimal methods of presenting dynamic text on different types of screens. Proceedings of the Second Nordic Conference on Human-Computer Interaction NordiCHI ‘02 (pp. 219-222), Aarhus, Denmark.
- Lee, A. Y., & Bowers, A. N. (1997, September). The effect of multimedia components on learning, Proceedings of the Human Factors and Ergonomics Society (pp. 340-344), Albuquerque, NM.
- Leitch, D., & MacMillan, T. (2003). Liberated learning initiative innovative technology and inclusion: Current issues and future directions for liberated learning research. Halifax, Nova Scotia, Canada: Saint Mary’s University. Retrieved March 10, 2006, from http://www.liberatedlearning.com/
- Liberated Learning Consortium (2006). Retrieved March 10, 2006, from
- Mayer, R. E. & Moreno, R. A. (2002). Cognitive theory of multimedia learning: Implications for design principles. Retrieved December 8, 2005, from http://www.unm.edu/~moreno/PDFS/chi.pdf
- Mills, C., & Weldon, L. (1987). Reading text from computer screens. ACM Computing Surveys, 19(4), 329-357.
- Moreno, R., & Mayer, R. E. (2002).Visual presentations in multimedia Learning: Conditions that overload visual working memory. Retrieved December 8, 2005, from http://www.unm.edu/~moreno/PDFS/Visual.pdf
- Muter, P. (1996). Interface design and optimization of reading of continuous text. In H. Van Oostendorp & S. De Mul (Eds.), Cognitive aspects of electronic text processing, (pp. 161-180). Norwood, NJ: Ablex.
- Najjar, L. J. (1998). Principles of educational multimedia user interface design. Human Factors, 40(2), 311-323.
- Narayanan, N. H., & Hegarty, M. (2002). Multimedia design for communication of dynamic information. International Journal of Human-Computer Studies, 57(4), 279–315.
- Nuance (2005). Dragon audiomining. Retrieved December 8, 2005, from http://www.nuance.com/audiomining
- Olavsrud, T. (2002). IBM wants you to talk to your devices. Retrieved December 12, 2005, from http://www.internetnews.com/ent-news/article.php/1004901 Piolat, A., Olive, T., & Kellogg, R.T. (2004). Cognitive effort during note taking. Applied Cognitive Psychology, 19(3), 291-312.
- Piolat, A., Roussey, J.Y., & Thunin, O. (1997). Effects of screen presentation on text reading and revising. International Journal of Human-Computer Studies, 47(4), 565-589.
- Shneiderman, B. (2000). Universal usability. Communications of the ACM, 43(5), 85-91. Retrieved March 10, 2006, from
- Stifelman, L., Arons, B., & Schmandt, C. (2001, March/April). The audio notebook: Paper and pen interaction with structured speech. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 182189), Seattle, WA.
- Tegrity (2005). Tegrity pen. Retrieved December 8, 2005, from http://www.tegrity.com/
- Tyre, P. (2005, November 28). Professor in your pocket. Newsweek.
- Wald, M. (2005a, June). Personalised displays. Proceedings of Speech Technologies: Captioning, Transcription and Beyond Conference, New York , IBM
- Wald, M. (2005b, June). SpeechText: Enhancing learning and teaching by using automatic speech recognition to create accessible synchronised multimedia. Proceedings of ED-MEDIA 2005 World Conference on Educational Multimedia, Hypermedia, & Telecommunications (pp. 4765-4769), Montreal,
- Wald, M. (2006). Creating accessible educational multimedia through editing automatic speech recognition captioning in real time. International Journal of Interactive Technology and Smart Education: Smarter Use of Technology in Education, 3(2), 131-142.
- Whittaker, S., Hyland, P., & Wiley, M. (1994, April). Filochat handwritten notes provide access to recorded conversations. Proceedings of CHI ’94 (pp. 271277), Boston, MA.
- Wilcox, L., Schilit, B., & Sawhney, N. (1997, March). Dynomite: A dynamically organized ink and audio notebook. Proceedings of CHI ‘97 (pp. 186-193), Proceedings of CHI ‘97 (pp. 186-193), Proceedings of CHI ‘97 Atlanta, GA.
These references have been extracted automatically and may have some errors. If you see a mistake in the references above, please contact email@example.com.