Benchmarking the User Experience

Published: June 14, 2024

Benchmarking the User Experience

Metadata

Author: Jeff Sauro
Full Title: Benchmarking the User Experience
Category: #books

Highlights

The user experience is the combination of all the behaviors and attitudes people have while interacting with an interface. (Location 234)
Task completion Task time Clicks Ability to find products or information Attitudes toward visual appearance Attitudes toward trust and credibility Perceptions of ease, usefulness, satisfaction (Location 237)
A benchmark is a standard or point of reference against which metrics may be compared or assessed. (Location 245)
One of the hallmarks of measuring the user experience is seeing whether design efforts actually make a quantifiable difference over time. (Location 256)
Conducting a benchmark study involves a lot of effort and coordination. To make a line, you need at least two points. The same can be said for benchmark studies. While many benchmark efforts start as stand-alone studies to get an idea of how good or bad an experience is, benchmark studies are often more effective when compared against a competitor, earlier version, or an industry standard (e.g., at least 90% completion rates). (Location 267)
There are essentially two types of benchmark studies: retrospective and task-based. Retrospective: Participants are asked to recall their most recent experience with an interface and answer questions. At my company, MeasuringU, we use this approach for the Consumer Software and Business Software Benchmark reports we produce—see, for example, “Net Promoter & UX Benchmark Report for Consumer Software (2017),” available online at measuringu.com/product/consumer-software2017/. Task-based: Participants are asked to attempt prescribed tasks using the interface that is being evaluated, which simulates actual usage in a controlled setting. This is the common usability test setup and is what we used when we created a benchmarking report for Enterprise’s company website, enterprise.com (this company provides rental cars, more information on this project later). (Location 282)
Memory decay is fallible. The more time that passes from when users actually worked with the system, the less salient the experience is and consequently the metrics collected may be less accurate. Some preliminary data we’ve collected suggests users tend to provide higher metrics in retrospective studies than those in a usability test. (Location 302)
Benchmark studies are often called summative evaluations where the emphasis is less on finding problems but more on quantitatively assessing the current experience. That experience is quantified using both broader study-level metrics and granular task-level metrics (if there are tasks). (Location 387)
The Standardized User Experience Percentile Rank-Questionnaire (SUPR-Q) provides a measure of the overall quality of the website user experience plus measures of usability, appearance, trust, and loyalty. The Standardized User Experience Percentile Rank-Questionnaire for mobile apps (SUPR-Qm) is a questionnaire for the mobile app user experience. The System Usability Scale (SUS) is a measure of perceived usability; good for software. The Net Promoter Score (NPS) is a measure of customer loyalty for all interfaces; best for consumer-facing ones. The Usability Metric for User Experience (UMUX-LITE) is a compact measure of perceived usefulness and perceived ease. Brand attitude or brand lift has a significant effect on UX metrics. Measuring a user’s attitude towards a brand before and after a study helps identify how much of an effect the experience has (positive or negative) on brand attitudes. (Location 392)
project booking form is basically a simplified study plan; it’s what we use during the initial call with a client to help focus the conversation. It’s a guide to outline the questions you need to answer before even thinking about testing a user. (Location 423)
within- versus between-subjects design. While the between-subjects approach is the more familiar one to researchers, you’ll see that the within-subjects approach has some important advantages. (Location 487)
People learn and get better with practice. However, you usually don’t want participants applying these learnings (also called sequence effects) from one design to the next. (Location 504)
If you go with a within-subjects approach, you can get a between-subjects comparison from a within-subjects study by restricting the analysis to the participants’ first experiences. In other words, every within-subjects design contains a between-subject design when properly counterbalanced and analyzed. (Location 522)
If you’re looking to identify a winner between alternative designs (even bad designs), a within-subjects approach is usually the way to go. (Location 527)
For enterprise software or even mobile apps, users tend to customize their experiences, which gives you the task of figuring out how they access the interface. This is common for enterprise systems like financials, HR, payroll, or sales-automation software, which are extensively customized by an organization. Determining which interface to use in a benchmark study can be a challenge. The following information details the considerations to take into account for the four types of interfaces used in benchmarking studies. (Location 610)
The research on testing with prototypes shows they are a reasonable substitute for ease of use perception metrics like the SUS and SEQ but less so for task time. (Location 642)
Factor to consider Publicly accessible Participant’s own account Demo system Prototype Realistic experience ++ ++ + - Data privacy ++ -- + + Affecting customer data ++ -- ++ ++ Functionality ++ + - -- (Location 647)
If this is a task-based benchmark, you should identify the task-topics (from a high level at this stage— the actual scenario with task-success criteria is presented in a subsequent chapter). The tasks you select should also address the study goals. For example, if you are focusing on the new user experience, tasks should be things new users to the product or website would do, such as registering, configuring, or setting up. If one of the goals of the benchmark is to determine how well participants use a product filter, tasks should expose participants to the filter. (Location 689)
Now think of the most important features that you need when you create or edit a document—the features you couldn’t live without. These features likely support your top tasks. (Location 717)
You’ll also see the long tail of the tasks that are less important. Of course, you can’t just stop supporting your less important tasks, but you should be sure that customers can complete the top tasks effectively and efficiently. These top tasks should become the core tasks you conduct benchmark tests around and the basis for design efforts. (Location 757)
Attributes: The classic demographics of age, gender, income, education level, and geography are all attributes of user populations. While you’ll likely need to collect many of these variables, they are usually incidental aspects of your customers. Behaviors: What people have done or knowledge they have is usually a much more differentiating factor. This includes prior experience with a product (such as accounting software or a mobile app) and domain knowledge (such as a financial advisor or IT decision maker). (Location 768)
It’s rare that users of a product or website are homogenous. At the very least you need to consider prior experience with the website, domain, or product. If you have different subgroups, this may affect the questions you ask, the tasks you present, and the overall sample size. (Location 784)

public: true

title: Benchmarking the User Experience longtitle: Benchmarking the User Experience author: Jeff Sauro url: , source: kindle last_highlight: 2022-04-22 type: books tags:

Benchmarking the User Experience

rw-book-cover

Metadata

Author: Jeff Sauro
Full Title: Benchmarking the User Experience
Category: #books

Highlights

The user experience is the combination of all the behaviors and attitudes people have while interacting with an interface. (Location 234)
Task completion Task time Clicks Ability to find products or information Attitudes toward visual appearance Attitudes toward trust and credibility Perceptions of ease, usefulness, satisfaction (Location 237)
A benchmark is a standard or point of reference against which metrics may be compared or assessed. (Location 245)
One of the hallmarks of measuring the user experience is seeing whether design efforts actually make a quantifiable difference over time. (Location 256)
Conducting a benchmark study involves a lot of effort and coordination. To make a line, you need at least two points. The same can be said for benchmark studies. While many benchmark efforts start as stand-alone studies to get an idea of how good or bad an experience is, benchmark studies are often more effective when compared against a competitor, earlier version, or an industry standard (e.g., at least 90% completion rates). (Location 267)
There are essentially two types of benchmark studies: retrospective and task-based. Retrospective: Participants are asked to recall their most recent experience with an interface and answer questions. At my company, MeasuringU, we use this approach for the Consumer Software and Business Software Benchmark reports we produce—see, for example, “Net Promoter & UX Benchmark Report for Consumer Software (2017),” available online at measuringu.com/product/consumer-software2017/. Task-based: Participants are asked to attempt prescribed tasks using the interface that is being evaluated, which simulates actual usage in a controlled setting. This is the common usability test setup and is what we used when we created a benchmarking report for Enterprise’s company website, enterprise.com (this company provides rental cars, more information on this project later). (Location 282)
Memory decay is fallible. The more time that passes from when users actually worked with the system, the less salient the experience is and consequently the metrics collected may be less accurate. Some preliminary data we’ve collected suggests users tend to provide higher metrics in retrospective studies than those in a usability test. (Location 302)
Benchmark studies are often called summative evaluations where the emphasis is less on finding problems but more on quantitatively assessing the current experience. That experience is quantified using both broader study-level metrics and granular task-level metrics (if there are tasks). (Location 387)
The Standardized User Experience Percentile Rank-Questionnaire (SUPR-Q) provides a measure of the overall quality of the website user experience plus measures of usability, appearance, trust, and loyalty. The Standardized User Experience Percentile Rank-Questionnaire for mobile apps (SUPR-Qm) is a questionnaire for the mobile app user experience. The System Usability Scale (SUS) is a measure of perceived usability; good for software. The Net Promoter Score (NPS) is a measure of customer loyalty for all interfaces; best for consumer-facing ones. The Usability Metric for User Experience (UMUX-LITE) is a compact measure of perceived usefulness and perceived ease. Brand attitude or brand lift has a significant effect on UX metrics. Measuring a user’s attitude towards a brand before and after a study helps identify how much of an effect the experience has (positive or negative) on brand attitudes. (Location 392)
project booking form is basically a simplified study plan; it’s what we use during the initial call with a client to help focus the conversation. It’s a guide to outline the questions you need to answer before even thinking about testing a user. (Location 423)
within- versus between-subjects design. While the between-subjects approach is the more familiar one to researchers, you’ll see that the within-subjects approach has some important advantages. (Location 487)
People learn and get better with practice. However, you usually don’t want participants applying these learnings (also called sequence effects) from one design to the next. (Location 504)
If you go with a within-subjects approach, you can get a between-subjects comparison from a within-subjects study by restricting the analysis to the participants’ first experiences. In other words, every within-subjects design contains a between-subject design when properly counterbalanced and analyzed. (Location 522)
If you’re looking to identify a winner between alternative designs (even bad designs), a within-subjects approach is usually the way to go. (Location 527)
For enterprise software or even mobile apps, users tend to customize their experiences, which gives you the task of figuring out how they access the interface. This is common for enterprise systems like financials, HR, payroll, or sales-automation software, which are extensively customized by an organization. Determining which interface to use in a benchmark study can be a challenge. The following information details the considerations to take into account for the four types of interfaces used in benchmarking studies. (Location 610)
The research on testing with prototypes shows they are a reasonable substitute for ease of use perception metrics like the SUS and SEQ but less so for task time. (Location 642)
Factor to consider Publicly accessible Participant’s own account Demo system Prototype Realistic experience ++ ++ + - Data privacy ++ -- + + Affecting customer data ++ -- ++ ++ Functionality ++ + - -- (Location 647)
If this is a task-based benchmark, you should identify the task-topics (from a high level at this stage— the actual scenario with task-success criteria is presented in a subsequent chapter). The tasks you select should also address the study goals. For example, if you are focusing on the new user experience, tasks should be things new users to the product or website would do, such as registering, configuring, or setting up. If one of the goals of the benchmark is to determine how well participants use a product filter, tasks should expose participants to the filter. (Location 689)
Now think of the most important features that you need when you create or edit a document—the features you couldn’t live without. These features likely support your top tasks. (Location 717)
You’ll also see the long tail of the tasks that are less important. Of course, you can’t just stop supporting your less important tasks, but you should be sure that customers can complete the top tasks effectively and efficiently. These top tasks should become the core tasks you conduct benchmark tests around and the basis for design efforts. (Location 757)
Attributes: The classic demographics of age, gender, income, education level, and geography are all attributes of user populations. While you’ll likely need to collect many of these variables, they are usually incidental aspects of your customers. Behaviors: What people have done or knowledge they have is usually a much more differentiating factor. This includes prior experience with a product (such as accounting software or a mobile app) and domain knowledge (such as a financial advisor or IT decision maker). (Location 768)
It’s rare that users of a product or website are homogenous. At the very least you need to consider prior experience with the website, domain, or product. If you have different subgroups, this may affect the questions you ask, the tasks you present, and the overall sample size. (Location 784)

andrewlb notes

Benchmarking the User Experience

Benchmarking the User Experience

Metadata

Highlights

title: Benchmarking the User Experience longtitle: Benchmarking the User Experience author: Jeff Sauro url: , source: kindle last_highlight: 2022-04-22 type: books tags:

Benchmarking the User Experience

Metadata

Highlights