What we were trying to solve

Users of Get Information About Schools (GIAS) often need structured school and group data for research, analysis, or operational systems. The existing GIAS interface is designed for people, not machines meaning organisations frequently scrape pages or handle large CSV downloads to get the data they need.

We needed a cleaner, more predictable way for systems to access GIAS data directly. This led us to design a read only API that exposes Establishments (schools) and Establishment Groups in a consistent way.

What we built

We designed a prototype API that:

  • lets clients request school or group data through simple HTTP endpoints
  • returns only the fields the client actually needs
  • streams large datasets efficiently
  • has predictable behaviour and clear error responses
  • is easy to extend as new domains (like workforce or governance) come on stream

Although the API is technical behind the scenes, the main idea is very simple: Let systems fetch GIAS data without scraping or downloading everything.

How the API works in plain language

1. A client sends a request

Examples:

  • “Give me the establishment with URN 123456”
  • “Give me all academies, but only return name, postcode, and status”
  • “Give me details for this trust”.

The API checks that required parameters are present - for example, asking for a “list” endpoint without choosing fields results in a helpful error message.

2. The system gathers the data

The work is handled in layers:

  • Use cases decide what should happen when a request is made
  • Repositories decide how to get the data (currently SQL, but this could change)
  • Mappers turn raw database rows into properly structured domain objects
  • Domain models enforce rules (e.g. “a school must have a valid name”).

Each layer has one job, which makes the system easier to maintain.

3. The domain checks the data is valid

Before anything gets returned, the core domain model verifies the data:

  • Are the fields complete?
  • Do they follow business rules?
  • Are values like postcodes or contact details valid?

If anything breaks a rule, the request fails safely.

4. The API returns the result

Clients get data in one of two formats:

  • JSON, streamed for efficiency
  • CSV, if requested

Error handling is consistent:

  • missing items → 404
  • unexpected issues → 500
  • unexpected issues → 500
  • everything else → 200 OK

The API only returns what was requested, so if a client only wants the “name” and “postcode”, it won’t receive extra data.

Why we took a clean architecture approach

We chose a layered “clean architecture” style so the core business rules stay stable even if:

  • the database changes
  • the API format changes
  • new domains are added
  • we change how we shape or map data.

This helps ensure:

  • predictable behaviour
  • easy testing (business logic can run without a database)
  • easier long term maintenance
  • safer changes as the system grows.

Future extensions (like governance, workforce or financial data) can follow the same pattern.

What we learned

  • Streaming is essential. Some datasets are too large to load in memory
  • Allowing clients to choose fields prevents over fetching and keeps responses small
  • Separating domain rules from infrastructure helps reduce regressions
  • Consistent logging across use cases greatly improves traceability when debugging.

What’s next

The prototype lays the groundwork for:

  • adding more domains of GIAS data
  • validating with internal and external data consumers
  • planning how the API could replace current CSV downloads or scraping
  • exploring authentication, rate limiting, and versioning approaches.

Written by Spencer O’Hegarty, Senior Software Engineer

Share this page

Tags

Architecture Data Development Schools API