RESNA 27th International Annual Confence

Technology & Disability: Research, Design, Practice & Policy

June 18 to June 22, 2004
Orlando, Florida


Speech Recognition vs. Non-speech Input Methods: Usage Patterns and Satisfaction

Heidi Horstmann Koester
Rehabilitation Engineering Research Center on Ergonomics
University of Michigan

Abstract

Twenty-four people who use automatic speech recognition (ASR) for computer access were surveyed regarding their ASR usage patterns and satisfaction. Twenty-one of these individuals also used a non-speech input method, and usage and satisfaction for non-speech input was measured for these subjects. 48% of participants reported using ASR for 25% or less of their computer tasks, while 37% use ASR for more than half of their computer tasks. The most appreciated benefit of ASR is to reduce fatigue and pain associated with manual input methods. Users' main concern with ASR is the inconsistent recognition accuracy and the need to fix recognition mistakes.

Keywords

speech recognition, alternative computer input, computer access, user performance, outcomes

Background

This study is part of a larger project that addresses the question of how well ASR systems are meeting the needs of users who have physical disabilities. Results for speed and accuracy performance measures with ASR have been reported previously [1]. This paper focuses on more qualitative aspects of using ASR, such as usage patterns, likes/dislikes, and satisfaction, and compares these to outcomes with non-speech input methods.

Two studies suggest that a majority of ASR users use their system on a regular basis [2,3]. Additionally, however, there is a significant minority who use ASR seldom or not at all. Most users in these studies are reasonably satisfied with their ASR system, but there is room for improvement, particularly with respect to recognition accuracy, technical problems, and training [2,3]. Since most of these data reflect users of earlier discrete-utterance systems [2], and the remaining data represent 10 individuals [3], there is a need for an updated and broader understanding for users of more current ASR systems. Additionally, neither study reports on the relative usage of ASR as compared to non-speech methods, nor how users who have a choice of input methods exercise that choice.

Research Questions

The specific goals of this study are to: (1) determine usage patterns (how often, and for what kinds of tasks) and satisfaction (e.g, likes and dislikes) for automatic speech recognition; (2) determine usage patterns and satisfaction for non-speech input methods that are employed by ASR users; and (3) compare usage patterns and satisfaction for ASR and non-speech input methods.

Methods

Subjects

Twenty-four subjects participated. All have physical disabilities that affect their ability to use the standard keyboard and mouse, and all had at least 6 months of ASR experience. Nineteen subjects had some form of non-speech keyboard access. Eighteen of these typed directly on the standard keyboard, and one used an on-screen keyboard. Twenty-one subjects had some form of non-speech mouse/pointer access. Seven could use the standard mouse with no adaptations; thirteen used a trackball, and one used a head-controlled mouse.

Procedure

Sessions occurred in the subject's home or office. Subjects completed a 53-item survey in an interview format with a researcher. Items covered the following topics:

Data Collection and Analysis

Mean responses to quantitative items were calculated across subjects. To compare responses to different survey items, paired statistics were used: paired t-tests for items coded as ordinal variables, and chi-square tests for items coded as categorical variables. Open-ended comments during the survey were also recorded to provide further insight into subject responses.

Results

Web browsing was the most frequent computer task for these subjects, followed by email and word processing. All participants used all three of these applications at least some of the time. All other applications, such as finance, games, or graphics, were used much less often than the “big three.”

Table 1. Relative usage of speech recognition (ASR), keyboard (KBD), and pointer (PTR) for three categories of computer tasks. Each cell shows the percentage of subjects who reported using an input device for the specified portion of time.

Usage Time

All Computer Tasks

Text Input Tasks

Command Input Tasks

 

ASR

KBD

PTR

ASR

KBD

PTR

ASR

KBD

PTR

Don't Use

0.0

0.0

0.0

0.0

0.0

85.7

38.1

31.6

4.8

0 – 25%

47.6

42.1

33.3

23.8

36.8

14.3

28.6

52.6

9.5

26 – 50%

14.3

36.8

33.3

23.8

21.1

0.0

19.0

10.5

23.8

51 – 75%

23.8

15.8

23.8

14.3

15.8

0.0

9.5

5.3

23.8

76 – 100%

14.3

5.3

9.5

38.1

26.3

0.0

4.8

0.0

38.1

Table 2. Reasons for choosing non-speech methods over ASR. * -- importance significantly greater than neutral rating of 4.0.

Reasons for choosing non- speech input methods

Mean

They're easier

6.3*

They're faster

6.1*

Less set-up involved

4.6

Frustration with speech

4.4

To rest my voice

2.0

Just for variety

1.9

Table 1 shows the amount of computer time subjects reported using each of their input methods. Across all computer tasks, no single input method was the dominant choice for these subjects (p = 0.68). About half of the subjects reported using their ASR system for 25% or less of their computer time. Subjects chose to use ASR primarily for text input tasks, in which keyboard and ASR input was used with roughly the same frequency (p = 0.79), rather than command input tasks, in which manual pointing was the dominant method (p < 0.001).

Subjects were also asked about their reasons for choosing non-speech input methods, and their responses are shown in Table 2. They rated the importance of six reasons on a scale of 1 – 7, with 7 being most important. The top two reasons are when other input methods are judged to be easier and/or faster. An important reason that was not included explicitly in the survey but emerged from subject comments was technical incompatibility between ASR and some applications. Twelve subjects volunteered this reason. Subjects had particular problems getting ASR to work properly with their email programs.

Table 3. Percent of subjects who liked particular aspects of ASR and non-speech input methods.

Likes

% Responding Yes

 

ASR

Non-speech

Effort

83.3%

31.8%

Speed

75.0%

68.2%

Ease

75.0%

72.7%

Fun

66.7%

50.0%

Accuracy

54.2%

72.7%

Cool

37.5%

23.8%

Regarding satisfaction, subjects were about equally satisfied with the training they received across all input methods. The “sufficiency” of ASR training received an average rating of 5.5 on a scale of 1 to 7, and averages of 5.1 and 5.2 for keyboard and pointing device training, respectively. Subjects' likes and dislikes during actual use, however, showed notable differences between ASR and other input methods, particularly with regard to fatigue and accuracy, as shown in Table 3. Subjects also rated their agreement to several statements regarding learnability, ease of use, reliability, speed, and fun. Speech recognition rated significantly more difficult to learn than other input methods (p<0.05), although neither was judged especially difficult to learn. ASR was also judged to have less consistent accuracy as compared to other input methods (p<0.05).

Discussion

These results suggest that ASR is succeeding for the most part at reducing effort and providing acceptable speed. However, users who have a choice of input methods often choose not to use ASR. This, combined with the compatibility problems that exist between ASR and many applications, suggests that ASR users should also be provided with a non-speech method if at all possible. Unfortunately, we still do not fully understand how to optimally leverage multiple input methods. For example, most subjects believed that pointer use was quicker and easier than using spoken commands, and that manually typing is a better choice than ASR for “short” text (a sentence or two). Are these beliefs accurate? Further work is necessary to determine the circumstances under which ASR provides a performance advantage as compare to non-speech methods.

References

  1. Koester HH. (2003). Performance of experienced speech recognition users. In Proceedings of the RESNA 2003 Conference . Washington, DC: RESNA.
  2. Schwartz P, Johnson J. (1999). The effectiveness of speech recognition technology. In Proceedings of RESNA '99 Conference (pp. 77-79). Washington, DC: RESNA.
  3. DeRosier R. (2002). Speech Recognition Software as an Assistive Device: A Study of User Satisfaction and Psychosocial Impact , M.S. Thesis, Temple University.

Acknowledgments

This study was funded by U.S. Dept of Education Grant #H133E980007. Many thanks to the participants for their time and effort, and to Ruthvick Divecha for his assistance with data management and analysis.

Heidi Horstmann Koester, Ph.D.
2408 Antietam, Ann Arbor MI 48105
hhk@umich.edu

RESNA Conference Logo