RESNA 27th International Annual Confence
Statistical Identification of Factors that Influence Performance with Speech Recognition
The goal of this study was to identify factors that account for the variation in performance with automatic speech recognition (ASR) systems. Using data from experienced ASR users with physical disabilities, the effect of 20 independent variables on recognition accuracy and text entry rate with ASR was measured using bivariate and multivariate analyses. Use of appropriate correction strategies had the strongest influence on user performance. The amount of time the user spent on their computer, the user's manual typing speed, and the speed with which the ASR system recognized speech were all positively associated with better performance. The amount or perceived adequacy of ASR training did not have a significant impact on performance for this user group.
speech recognition, computer access, user performance, outcomes, multiple regression
User performance with automatic speech recognition varies widely, for both new and experienced users. Data from 8 new ASR users show that after 4 to 6 weeks of use, recognition accuracy ranged from 60% to 99%, and text entry rate ranged from 1.5 words per minute (wpm) to 72.6 wpm [1]. For 23 experienced ASR users, the range for recognition accuracy was 72% to 94%, and 3 to 32 wpm for text entry rate [2]. There are many possible reasons for this diverse performance, including factors related to the hardware and software in the system, the user's training and experience, specific ASR usage techniques, and user characteristics [3].
Such wide variation defies a simple answer to the question of the performance that users can expect from ASR systems. This study was conducted to provide some insight into why some ASR users perform relatively well, and others relatively poorly.
Data from 23 experienced ASR users with physical disabilities were analyzed to determine the factors that influenced user performance with ASR. Measurements of recognition accuracy and text entry rate with ASR were the dependent variables [2]. Indicators for 20 potential factors were formed from responses to survey questions and other measures with this same group of users. The relationship between these 20 independent variables (representing the possible factors) and the 2 dependent variables (representing actual user performance) was assessed graphically and statistically using scatter plots, bivariate analyses, and multivariate regression modeling.
The bivariate relationship between each independent variable and each dependent variable was graphed for visual inspection, and determined statistically by calculating the Pearson correlation. Statistical significance for the correlations was set at the 0.05 level.
Multiple regression models were developed for both recognition accuracy and text entry rate. Multivariate influence was examined for any independent variable that had:
(1) a visible bivariate relationship on the scatter plot; AND (2) a statistically significant bivariate correlation OR a bivariate correlation greater than 0.2 (absolute value). The first step was to find the “best” one-factor model from the pool of candidate factors, then determine if any of the remaining factors significantly improved the model enough to warrant a two-factor model. If a two-factor model was found, the remaining factors were again searched for a possible three-factor model. A model was judged to be “better” than another if it had: a higher adjusted R 2 value, greater statistical significance for each independent variable's model coefficient, stronger partial relationships based on graphic analysis, and more robust satisfaction of regression assumptions.
The purpose of the multivariate modeling was to identify influential factors and their relative influence on ASR performance. An independent variable was considered to be an “influential factor” if its standardized Beta coefficient in a multivariate model was significant at the p < 0.05 level. The relative strength of two or more influential factors in a single model was determined by comparing their standardized Beta coefficients.
Table 1 shows the Pearson correlations between all candidate factors and recognition accuracy and text entry rate. For recognition accuracy, 10 candidate factors were retained for multivariate analysis. Only weak bivariate relationships were found between recognition accuracy and factors related to hardware and software, ASR training, or the amount of experience subjects had using ASR. For text entry rate, 9 factors were retained for multivariate analysis. ASR training factors showed relatively little relationship to text entry rate, as did the amount of ASR experience subjects had.
Independent Variable | Dependent Variable | |
---|---|---|
Rec Acc |
TER |
|
Hardware/Software |
||
RAM |
0.024 |
0.158 |
ASR Delay |
0.085 |
-0.356 |
Microphone |
-0.152 |
-0.105 |
Text Application |
-0.081 |
0.010 |
ASR Training/Usage |
||
Training Hours |
0.001 |
-0.114 |
Training Adequacy |
0.190 |
-0.127 |
ASR Usage |
0.090 |
-0.010 |
ASR Text Usage |
0.419* |
0.227 |
ASR Experience |
-0.078 |
-0.004 |
ASR Techniques |
||
“Scratch That” Usage |
-0.681** |
-0.598** |
Proofread Style |
0.266 |
-0.003 |
Words per Utterance |
0.315 |
0.559** |
Dictation Speed |
0.132 |
0.426 |
Computer Experience and
Usage |
||
Computer Usage |
0.251 |
-0.053 |
Word Proc Time |
0.413* |
0.355 |
Pre-ASR Experience |
-0.311 |
-0.198 |
User Characteristics |
||
Gender |
-0.078 |
-0.147 |
Education |
0.338 |
0.371 |
Need Computer for Job/school |
0.397 |
0.478* |
Typing Speed |
0.189 |
0.610** |
Other ASR Factors |
||
Recognition Accuracy |
1.0 |
0.687** |
Table 2 shows the best multi-factor model found for recognition accuracy. Recognition accuracy was influenced most strongly by the frequency with which users employed the “Scratch That” method of correcting recognition errors. A secondary influence in recognition accuracy was the amount of time users spent on their computer each week.
Model Equation |
Scratch That (ST) |
Computer Usage (CU) |
|
||
---|---|---|---|---|---|
Partial b |
Sig. of b |
Partial b |
Sig. of b |
Adj. R2 | |
RA = 85.5 – 0.25(ST)+ 0.48(CU) |
-0.748 |
<0.001** |
0.347 |
0.034* |
0.535 |
Table 3 shows the best multi-factor model found for text entry rate. Use of “Scratch That” was again the most influential factor, primarily because of its influence on recognition accuracy. Of secondary importance was the ASR Delay, or how long it takes for the ASR system to display a recognition at the completion of an utterance. Finally, typing speed without ASR also emerged as an influential factor in text entry rate.
Model Equation | Scratch That (ST) | Typing Speed (TS) | ASR Delay (AD) | ||||
---|---|---|---|---|---|---|---|
Partial b |
Sig. of b |
Partial b |
Sig. of b |
Partial b |
Sig. of b |
Adj. R2 | |
TER = 22.4 – 0.22(ST)+ 0.28(TS) – 11.9(AD) |
-0.523 |
0.002** |
0.377 |
0.019* |
-0.384 |
0.012* |
0.625 |
While this study only begins to answer the question of which factors have the most influence in ASR performance, the results support the following clinical implications:
This study was funded by U.S. Dept of Education Grant #H133E980007.
Heidi Horstmann Koester, Ph.D.
2408 Antietam
Ann Arbor MI 48105
hhk@umich.edu