Early in the Alpha we needed to establish a baseline for GIAS, a way of measuring usability and user perception at the start of the project that we could return to at the end and compare. Without a baseline, any claim that the redesigned service is "better" is essentially impressionistic.
The obvious starting point was the System Usability Scale (SUS). It's well known, widely used in government digital services, and produces a single comparable score. But the more I looked at what GIAS actually needed to measure, the less confident I was that SUS alone would give us useful data.
The problem with SUS for GIAS
SUS measures usability. Specifically, whether people can use a system and whether it feels efficient and learnable. That's valuable, but for GIAS it misses something significant: trust.
GIAS is an authoritative data source. People rely on it to make decisions about funding, compliance, policy, and more. The question of whether users trust the information they're looking at is at least as important as whether they can find it. SUS has no mechanism for measuring that at all.
There's also the nature of the service to consider. SUS was designed with consumer-facing software in mind. It includes items that work well for general applications but sit awkwardly with admin and data systems - the kind of tool where people complete specific, defined tasks rather than browsing freely. A score of "okay" on SUS doesn't tell you much about whether an analyst could find the UKPRN they needed, or whether a researcher trusted what they downloaded.
Exploring alternatives
I worked through several established frameworks before settling on a recommendation.
UMUX and UMUX-Lite are shorter alternatives to SUS with two items rather than SUS's ten. They are validated to correlate strongly with SUS scores. UMUX-Lite in particular covers "the service's features meet my needs" and "this service is easy to use," which maps cleanly to what we care about without the overhead of the full SUS questionnaire.
SUPR-Q (Standardised User Experience Percentile Rank Questionnaire) is more holistic — it covers usability, trust, loyalty, and appearance. More useful in some ways, but heavier, and includes items that aren't relevant to a data service such as GIAS (loyalty, for instance).
User Experience Questionnaire (UEQ) is comprehensive but long. Too long for what we need, and some dimensions (novelty, stimulation) have limited relevance here.
NASA Task Load Index (TLX) measures cognitive load specifically. Interesting for certain task types - particularly complex or multi-step tasks — but not a general service baseline.
HEART framework (Google) is designed around product metrics: Happiness, Engagement, Adoption, Retention, Task success. Useful for ongoing product measurement, but it's behavioural and analytics-based rather than questionnaire-based. Good for later-stage measurement, less suited to an Alpha baseline.
Single Ease Question (SEQ) is a single-item rating per task: "How difficult or easy was it to perform this task?" on a 7-point scale. Minimal overhead, well validated, and gives a task-level signal without adding significant burden to a research session.
What we actually need to measure
Working through the alternatives clarified the real question: what does "success" look like for GIAS, and therefore what do we need to track?
Four things stood out:
- Clarity of service. Do users understand what GIAS is and what it's for?
- Search success. Can users find what they're looking for?
- Confidence and trust in the data. Do users believe what they see is accurate, current, and authoritative?
- Perception of the service overall. Does it feel like a credible, professional service?
No single framework covers all of this. The answer is a lightweight combination.
The framework we landed on
Task performance + Single Ease Question for each defined task. Four tasks were identified:
- Identify a specific establishment's UKPRN and when their data was last changed or confirmed
- Find all secondary schools in a specified city
- For a specified school, find the name of the governor whose term is ending soonest
- Download the links for all open children's centres
After each task, participants rate difficulty on the SEQ 7-point scale (very difficult > very easy). This gives us completion rate, time on task, error rate, and perceived difficulty, all at task level.
UMUX-Lite for overall usability:
- "The service's features meet my needs" (1-7 scale)
- "This service is easy to use" (1–7 scale)
A custom trust baseline, drawing on the Trust in Technology model:
- I believe the information on GIAS is accurate
- I believe the information on GIAS is up-to-date
- GIAS is the authoritative source for establishment information
- I would rely on information from GIAS to make decisions
- It is clear where the information on GIAS comes from and who is responsible for it
- If I found incorrect information I would know how to report it
All trust items use the same 1–7 (strongly disagree > strongly agree) scale.
Why this combination works
The task performance + SEQ layer gives diagnostic signal. If task 3 scores consistently low, we know governance information specifically is a problem. UMUX-Lite gives a comparable usability score with minimal questionnaire burden, especially given the research we already have. The trust baseline captures the dimension SUS entirely misses and is specific to what matters for an authoritative data service.
Crucially, the combination is repeatable. Even if the service changes significantly between Alpha and Beta, these measures can be re-run against the same tasks and compared directly.
What it doesn't cover
This framework doesn't measure emotional experience, satisfaction beyond task completion, or longitudinal behaviour (return visits, exports over time). For those, the HEART framework would be more appropriate once the service is live and generating behavioural data. That's a Beta or live measurement problem rather than an Alpha one.
Written by Steve O'Connor, Lead Interaction Designer