I. Introduction
Personal health records (PHRs) have many benefits, such as helping people manage their health records by enabling access to health information and facilitating communication by building bridges between patients and healthcare providers [
1,
2]. Initially, PHRs were web-based applications, but in recent years, the use of mobile PHR (mPHR) applications has increased. In 2021, there were 5.29 billion mobile phone users [
3] and more than 350,000 mHealth applications were available in the app stores [
4]. Despite the numerous potential advantages of PHRs, poor usability remains a significant barrier to patient acceptance and adoption of PHRs [
5,
6].
Usability is defined by the International Organization for Standardization as “the effectiveness, efficiency, and satisfaction with which specific users achieve specified goals in particular environments” [
7]. The amount of resources required to complete a task is related to efficiency. In general, efficiency can have both an explicit, physical component, relating to the speed required to finish a task, and a mental or cognitive component, related to the mental resources needed for a task. Poor system usability caused by inefficient, ineffective, and complex designs increases the mental effort due to the human cognitive architecture, which involves a limited working memory [
8] and a theoretically unlimited long-term memory that shapes cognitive schemas. Usability increases when a user’s working memory is suitable for concentrating on the details of the information to be used [
9]. Designs that force the user to retain too much information in working memory at once are likely to fail. A well-designed system should not exceed the capacity of the user’s short-term memory [
10,
11]. A poor design can quickly deplete individuals’ very limited pool of cognitive resources. In such cases, as cognitive load increases, user performance decreases, and the likelihood of making mistakes rises rapidly [
12].
Cognitive task analysis (CTA) is a method used to reveal cognitive overload that examines and analyzes how people’s mental processes work while performing a task competently [
13]. CTA is considered one of the best methods for effectively analyzing the physical and mental procedures involved in a task [
14]. This method focuses on comprehending and quantifying how users solve problems by assigning specific tasks that involve reasoning and then observing how they make decisions [
15]. Human factors experts employ CTA to enhance the design of decision support systems, human-computer interfaces, and training programs [
16].
The goals, operators, methods, selection rules (GOMS) model [
17] is a CTA method that characterizes human-computer interaction in terms of user goals, the actions taken (operators), and methods employed to achieve these goals. The GOMS model aids in assessing the physical and mental effort required to attain a goal by identifying the steps necessary to reach the desired objective when using an interface. Research [
18,
19] has shown that scrutinizing the fundamental task steps can help eliminate unnecessary processes and streamline task execution for optimal efficiency. This model enables designers to quantitatively estimate the time required or the efficiency of a given user interface for a specific task by breaking down the user’s steps and summing the execution time of each step, all without conducting detailed user tests [
20]. Goals frequently comprise sub-goals, representing users’ intentions for using the system. Operators encompass basic actions necessary to interact with a system (e.g., pressing a button, swiping the screen), while methods denote sequences of operators users employ to achieve each goal. When multiple methods exist for achieving a goal, context-based selection rules specify which method to use.
Card et al. [
21] developed an estimation tool, known as the keystroke-level model (KLM), based on the GOMS model, to estimate the duration of physical and mental activities on desktop computers using the mouse and keyboard. The fundamental concept behind KLM is to estimate the execution time of a task by listing the sequence of operators that a user employs to complete a task, summing the predetermined durations of these operators, and subsequently calculating the time required for task completion. However, since the KLM was developed for desktop computer interactions, it cannot be applied to evaluate the use of smart devices, which have become increasingly prevalent in our lives due to technological advancements. Recognizing this limitation, researchers have proposed new models [
19,
21–
25] based on the KLM method specifically tailored for mobile devices, aimed at assessing interactions with emerging technology.
In this study, we used CTA to assess the cognitive complexity and execution time of the Republic of Türkiye’s national mPHR application e-Nabız.
II. Methods
1. CTA
This study used the GOMS model to analyze the cognitive complexity of the e-Nabız mPHR application. Two proposed GOMS analysis techniques—the updated GOMS model [
20] and the gesture-level model [
22,
23]—were applied to estimate the execution times of the determined prototypical tasks using the device Samsung Note10+ (Android OS, screen size of 6.8 inches/17.27 cm). A mental operator and three physical operators were used, as suggested in these models (
Table 1), since these are sufficient to evaluate the e-Nabız application.
Physical operators (e.g., prepare finger/pointing [P], tap [T], and drag [D]) are the movements associated with hand gestures needed to accomplish tasks. The definition of basic operators may differ based on various descriptors such as “prepare finger” in the updated GOMS model and “pointing” in the gesture-level model, although they describe the same movement.
Mental operators include “mentally initialing a task (MI),” “mentally deciding, or choosing (MD),” “mentally retrieving (MR),” “mentally finding (MF),” and “mentally verifying (MV),” as proposed by Li et al. [
24], and these operators give detailed information about the mental state of a user. MI refers to being mentally prepared for a task and takes place when a task is initialized. MD occurs when the user has to make a decision in case of two or more options. MR refers to a mental state that a user needs to recall information. The MF operator takes place in case there is an information or an object that needs to be searched on the screen, and MV is the mental operator used when a user confirms that the targeted page or result has been reached on the screen. Unlike physical operators, mental operators are not observable user behavior; their placements are guided by heuristic rules based on psychological assumptions about users, as proposed by Card et al. [
21].
2. E-Nabız
The national PHR application of the Republic of Türkiye, which served 10 million users in 2019, experienced a remarkable surge in its user base, reaching 25 million users during the coronavirus disease 2019 pandemic [
25]. The system seamlessly integrates real-time data from all healthcare institutions (public, private, or university hospitals), providing a platform where citizens can access their PHRs. Users can access a wealth of information, including laboratory results, radiology images, prescription and medication details. Additionally, with patient consent, healthcare providers or family members can also access this information. The menu structure diagram of the mPHR application is shown in
Figure 1.
3. Procedures
First, the most common 10 tasks a user can carry out with e-Nabız were determined: T1 “making an appointment,” T2 “canceling an appointment,” T3 “finding a previously taken radiology image/report,” T4 “reviewing physician visit information on a specific date,” T5 “adding reminder information to the prescribed medication from the last doctor visit,” T6 “entering allergy information,” T7 “reading the package insert of a previously prescribed drug,” T8 “recording medication side effects,” T9 “reviewing information about a diagnosed condition,” and T10 “view details from previous reports.” If necessary, each task was divided into subtasks.
Second, each task was broken down into its elementary steps according to the GOMS method and each step was described as a physical or mental operator (
Table 1). The locations of mental operators were determined as proposed by Card et al. [
21] and detailed based on their usage as suggested by Li et al. [
24].
Lastly, two proposed GOMS analysis techniques—the updated GOMS model [
20] and the gesture-level model [
22]— were applied to estimate the execution times of the 10 prototypical tasks. As seen in
Table 2, the times of these operators were recorded separately for both models and added. The execution time of mental operators was fixed, as supposed by Card et al. [
21].
This process was conducted independently by two evaluators (HY and NZ) who are experienced in usability evaluation. Cohen’s kappa test (SPSS version 23; IBM Corp., Armonk, NY, USA) was applied to calculate the interrater agreement.
Ethical approval was granted by the Akdeniz University Faculty of Medicine Clinical Research Ethics Committee (02.10.2019/899).
III. Results
Table 3 provides an overview of the estimated completion times for each task in seconds, based on both the updated GOMS model and the gesture-level model. These times exclusively accounted for task completion by expert users and did not factor in errors. Processes unrelated to task fulfillment were disregarded. The identified 10 tasks exhibited varying step counts, ranging from 17 to 121, with an average of 39.8 steps. Each step was categorized as either mental or physical, with an average of 47.71% representing mental operator steps.
The task completion times were computed using both the updated GOMS model and the gesture-level model. In the updated GOMS model, these times ranged from 16.40 seconds, the shortest completion time for task 9, to 101.85 seconds, the completion time for task 1. In the gesture-level model, completion times ranged from 15.14 to 97.06 secconds. The mental step completion times constituted 72.61% and 75.91% of the total completion time in the updated GOMS model and gesture-level model, respectively.
The inter-rater reliability values ranged from 0.68 (task 7) to 0.88 (task 1), with an average of 0.80 for the 10 tasks indicating good reliability of the evaluation method.
Table 4, using task 8 as an example, demonstrates how the 10 specific tasks were deconstructed into individual steps and categorized as either physical or mental operators. A further examination refined the mental operators, specifying the required mental actions. The analysis reveals that the MF, MV, and MR operators accounted for 52.72%, 21.2%, and 16.3% of the total mental operators, respectively.
Table 5 displays the distribution and quantity of mental operators across tasks. The calculated average execution times in both models are presented in
Figure 2.
IV. Discussion
This study aimed to estimate the efficiency of e-Nabız, the Republic of Türkiye’s national mPHR application, in terms of cognitive load and task completion times. The calculation of task completion times revealed that, according to the updated GOMS model, all tasks took approximately 5.70 minutes (342.10 seconds), while the gesture-level model estimated it as 5.45 minutes (327.25 seconds). In the updated GOMS model, mental operators constituted around 73% of the total time required to complete tasks, whereas in the gesture-level model, this ratio was approximately 76%. This notably high proportion of mental operators may lead to excessive cognitive overload and, consequently, an increased likelihood of user errors.
The average time for mental operators was calculated at 1.35 seconds according to the updated GOMS model, while that for physical operators was 0.44 seconds. In contrast, the gesture-level model utilized a time of 0.37 seconds for physical operators. It is important to note that these times may be longer for novice users, as the models are based on expert users, and the potential for errors among less experienced users is not factored into these calculations.
Among the 10 prototypical tasks, the tasks of “making an appointment” and “adding reminder information to the prescribed medication from the last doctor visit” had the highest total number of steps and mental steps a user needs to go through, respectively. The need for 121 steps (56.2% of which were mental) to make an appointment can potentially lead to mental load and fatigue for the user. Similarly, the task of adding a medication reminder involved a significant total step count (80) and a substantial percentage of mental operators (55%), increasing the likelihood of user errors that could adversely affect their health.
Saitwal et al. assessed the performance of an Electronic Health Record (EHR) system using CTA, evaluating 14 tasks, and they noted that 37% of the time required for task completion was attributed to mental operators, [
13]. In another study evaluating the usability of a dental EHR system, ten users were asked to complete four typical tasks in 30 cases. The study’s findings revealed that mental operators accounted for 35.30% of the total number of operators and 56.89% of the total time [
28]. In our study, we found that the average percentage of mental operators was 47.71%, comprising a significant portion of the total execution time (73%). Interestingly, the results from these prior studies, even though they centered on EHR systems, align with our PHR evaluation study, highlighting the substantial presence of mental operators.
GOMS analysis, when used alone or in combination with other usability tests, provides valuable insights for system design and evaluation. It allows the prediction of the sequence of physical and mental operators a user will employ to complete a task, aiding in design decisions regarding whether to add a new sequence or remove inefficient ones [
26]. Rasmussen and Kushniruk [
18] identified inefficient sequences of user interactions through video analysis and observations. Using GOMS-KLM analysis, they demonstrated an example of a redesigned system that eliminated inefficient interaction sequences, resulting in a remarkable 44.6% reduction in theoretical task completion time. A more recent usability assessed the physical and cognitive efficiency of both the original and redesigned electronic medical record (EMR) interfaces. Their evaluation included various metrics such as the NASA Task Load Index, KLM-GOMS, and eye-tracking analysis. While their findings revealed a significant decrease in response time only for one specific task, they observed substantial reductions in the mental workload across multiple tasks in the redesigned EMR [
27].
We have not identified any similar studies that evaluate mobile health applications on smartphones using the GOMS model. Existing assessments in the literature primarily focus on applications used on desktop computers and employ the KLM model. Consequently, it is not possible to directly compare our findings with prior data.
In the context of this study, we calculated task execution times using two distinct models specifically developed for mobile devices, and the results from both models closely aligned with each other. Any minor time variations could likely be attributed to the varying dimensions of the devices employed in the models. As such, mHealth evaluations can be effectively carried out using either model, although the actual calculated times will naturally vary based on the dimensions of the specific smart device used.
In this research, we chose to evaluate the 10 most frequently performed tasks to assess mobile PHRs. However, for increased reliability, evaluating all tasks would be ideal. Given the extensive range of smartphone models available in the market, the completion times for tasks are likely to differ due to variations in smartphone sizes. Furthermore, it is important to note that our evaluation was conducted exclusively on the Android operating system, but similar calculations can be applied to other smart device systems available in the market.
The need to validate the GOMS model for mobile devices is apparent; however, the ever-evolving landscape of phone models poses a challenge. Adapting the GOMS model to accommodate these constantly changing devices is necessary. The rapid turnover of mobile phone models in the market restricts the feasibility of validation efforts. Additionally, it is essential to note that task time calculations are primarily based on expert users, and these times may vary for novice users. Furthermore, potential errors, interruptions, and system response times while using the application are not factored into these calculations.
Moreover, there exists a web version of the mPHR, providing users with an alternative means to access and manage health information. Differences may exist between the mobile application and the web version, and the latter can be assessed using KLM models designed for desktop computer usage.
The findings from this research underscore a significant usability challenge related to cognitive overload in a mobile app with millions of users. These results can inform the development of more user-friendly mobile health applications, thereby enhancing their usability. Reducing the cognitive load during the use of mobile health applications can contribute to the increased adoption and effective utilization of these apps, ultimately benefiting public health.