RESNA 28th Annual Conference - Atlanta, Georgia
Kim Adams1&2, Kimberly Portis3, Ann Bisantz3, Michael Buckley4, Jeff Higginbotham5, Kris Schindler4, Matt Sweeney4
1Rehabilitation Medicine, University of Alberta, Edmonton, AB
2Glenrose Rehabilitation Hospital, Edmonton, AB
3Human Factors, Industrial Engineering, University at Buffalo, Buffalo, NY
4Computer Science and Engineering, University at Buffalo, Buffalo, NY
5Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY
The "Heuristic Evaluation - A System Checklist" from Xerox Corporation was used to evaluate two Augmentative and Alternative Communication (AAC) software interfaces [1]. The first evaluation was performed on the Tablet Portable Impact device, made by Enkidu Research, Inc., which has been on the market for several years. The second evaluation was on the UB Talker, a prototype device which is in development by a team of local computer scientists. The feasibility and value of using this tool on the AAC interfaces was investigated, as well as the differences in using it with a well established device versus a prototype device.
KEYWORDS: user interface, usability, heuristic evaluation, AAC
The larger scope project is to find an approach to predict outcomes when using an AAC device. The proposed process is to blend together the fields of AAC and Human Factors (HF) Engineering and use principles from both fields to identify and measure the factors influencing outcomes. As a first research design project, the hypothesis that “context sensitive vocabulary will result in improved task performance” was tested. Prototype AAC software, being developed by a team of local computer science professors and students, was used to provide the two experimental conditions, context sensitive vocabulary “on” and “off”. Since it is prototype software, it was important to make sure the user interface was as well designed as possible before using it for the research project, in order to reduce user errors occurring due to the interface.
Heuristic evaluation is a systematic inspection of a user interface for adherence to recognized usability principles [2, 3] . Evaluators attempt to find the usability problems in a design so that they can be changed during an iterative design process, or as part of a redesign. The evaluation tool chosen for the AAC interfaces was developed by Usability Analysis & Design, Xerox Corporation [1]. It is a very detailed systematic checklist which incorporates the general principles of several usability experts. It is important to note that this tool solely looks at design input interface and it provides no information on performance.
The Xerox Heuristic Evaluation – A System Checklist was downloaded from the internet [1]. It consists of 13 sections each with 3 to 50 questions, with 296 questions in total. Sections include the following categories: Visibility of System Status; Match Between System and the Real World; User Control and Freedom; Consistency and Standards; Help Users Recognize, Diagnose, and Recover from Errors; Error Prevention; Recognition Rather Than Recall; Flexibility and Minimalist Design; Aesthetic and Minimalist Design; Help and Documentation; Skills; Pleasurable and Respectful Interaction with the User; and Privacy. Each question can be answered by “Yes”, “No”, or “Not Applicable” and space is provided for comments. Questions were written in a format where the desired response was “Yes”. One sample question is, “Is there visual feedback when an object is selected or moved?”
The Xerox tool was used to evaluate the AAC software interface on two devices, the Tablet Portable Impact from Enkidu Research, Inc1., a well established device, and the UB Talker, a prototype device under development in the department of Computer Science at the University at Buffalo. Both devices are based on a tablet PC computer and provide text-to-speech synthesized output.
Before using the tool, it was necessary to define terminology and to establish the start criteria for the AAC software. Terminology used in the evaluation must be defined to make the questions more applicable to the AAC software. For instance, “task”, was defined to mean producing an utterance and “function keys” were defined to be the shift and caps lock keys. Since the same question can be answered differently depending on which mode the software is in, start criteria for both AAC programs was as follows: keyboard page only (no access to the parameter changing pages), QWERTY keyboard layout, full screen mode, no other programs running, touch screen input mode, audio feedback by word, and no abbreviation expansion. Note that the evaluation could be performed successively, for multiple start criteria or system modes, in order to evaluate alternative operating modes.
The Portable Impact interface was evaluated first. Two researchers, one with AAC background and the other with HF background, did the evaluation on their own, tabulating the results into spreadsheets. Then the researchers came to a consensus on each answer through discussion.
The UB Talker interface was evaluated second. Again, the researchers did the evaluation separately, and then came to a consensus on the answers. Afterwards, the results were presented to the full team which included the other AAC and HF researchers and the device programmers [2]. Each answer of “No” was investigated to determine if it was okay to remain as it was, and if not, what fix was required. Next the group came to consensus on a priority order for the fixes. Participation of the programmers at this meeting was very important since they suggested how much time each fix would require and sometimes this information had ramifications on the priority order.
Since the heuristic evaluation was performed first on the Portable Impact interface, this is when the methodology was worked out. Initial agreement between the two researchers on evaluation items was low, requiring discussion to reach consensus on up to 80% of the answers. It became obvious that start criteria and terminology must be defined in order to make the evaluation functional for our purpose. Start criteria such as limiting the evaluation to the keyboard page resulted in many questions becoming “Not Applicable”, such as any question regarding menu items. In all, 168 out of 296 questions were answered “Not Applicable”. Defining terminology was necessary, not only to make the tool more applicable to the AAC area, but to give the two researchers common terminology. Even with the efforts to define a common starting point and device terms, there were still many differences in the answers the researchers got on their own. In one section, 43% of the UB Talker answers required discussion to reach consensus.
Using the evaluation on the well established AAC interface did not identify any key interface problems, perhaps because the device has already been designed and evaluated. It did uncover some cases where the interface did not meet criteria for usability, although these were not crucial in understanding device functionality. One example is, “Is there some sort of feedback after every operator action?” There was no indication of a problem when the user pressed on the area outside of the keys, or when they pressed backspace when already at the beginning of the line.
Performing the heuristic evaluation on the prototype device uncovered many more significant interface issues. For example, for the same question as above, “Is there some sort of feedback after every operator action?” a crucial fix was recommended. This device did not have a cursor indicator in the message window to indicate where the next letter would be inserted. Hence, the user could not tell if pressing the space key resulted in getting a space at the end of their text string. The team determined that this was crucial to understanding the user interface and putting a cursor in the message window became a required fix.
Regarding the amount of time required to complete this heuristic evaluation, at first it was very time consuming but with experience with the process became much faster. For the very first evaluation, before establishing the criteria, time to complete one section of 25 questions independently was 45 minutes, and to discuss them to come to consensus was also 45 minutes. The amount of time decreased with elimination of the questions, and familiarity with the process.
So, is it feasible to use the Xerox evaluation on AAC software interfaces? The initial impression was that many of the questions were not applicable, which indicated that the tool would not provide useful information. However, once the operating criteria and terminology were appropriately set, the evaluation tool did provide a meaningful way to assess the user interface.
What value can be gained from using this tool? Essentially, the evaluation tool allowed the researchers to systematically “proof read” the user interface to identify common user interface problems related to such things as the visibility of system functions, feedback provided to users, and functionality that is both internally consistent, and consistent with user expectations. A very valuable aspect was to be able to flag issues of concern in the prototype interface and have justification for necessary design changes.
Are there differences using this tool with a well established device versus a prototype? As expected, evaluating the prototype uncovered many more issues than with the well established device. If crucial changes are required, then at least it is possible to make them if working with a prototype device. However, if there are usability issues with a device purchased from a third party, the only control we have is to adjust built-in parameters. If the issue cannot be eliminated or reduced in that way, training may help users to avoid making errors as a result of the issue.
Kim Adams
kim.adams@mindspring.com
Rehabilitation Medicine
3-48 Corbett Hall,
University of Alberta,
Edmonton, AB T6G 2G4
1 The Impact communication device is currently manufactured by Dynavox technologies.