M3: How Digital Sustainability Is Measured

Section 1

Where programmes build credibility, or lose it.

Measurement is the dividing line between programmes that change behaviour and programmes that produce decks. This module gives you the minimum fluency needed, not to become a carbon accountant, but to understand what the numbers mean, where they come from, and how to spot the ones that mislead.

Before any measurement choice makes sense, decide what the measurement is for. The same organisation typically has three live uses of carbon data, and they are not interchangeable.

Use 01 · Disclosure

External reporting and regulatory submission

Annual report, CSRD, SBTi inventory. Tolerates spend-based and modelled data where activity data does not exist, provided confidence is documented. The bar is auditability and consistency, not operational granularity.

Use 02 · Baseline

Setting a starting point for improvement

One-off, directional, and explicit about what is in scope. Mixed-method is normal. The bar is honesty about confidence by category, not a single composite number that hides where the data is thin.

Use 03 · Operational decision

Steering live workloads, procurement, and design

Needs activity data: measured energy, real utilisation, region and time of day. Spend-based numbers cannot tell an architect which region, an engineer which workload to move, or a procurement lead which supplier is measurably better.

The measurement credibility test

Can you explain what each number in your sustainability report is measuring, how it was calculated, what the confidence level is, and which of the three uses above it is fit for? If the answer to any of those is "I'm not sure," you have a measurement credibility gap, and probably a number being used for a decision it cannot support.

Section 2

Scopes 1, 2, and 3 in an IT context.

Scopes are not the measurement method. They are the boundary framework that tells you what is in and out before you decide how to measure each box. The GHG Protocol Corporate Standard sets the framework (Scopes 1, 2, and 3), with Scope 4 (avoided emissions) sitting outside the formal inventory but worth knowing. Understanding where IT activity sits across these boundaries is the precondition for choosing the right method per category.

Select a scope to explore its IT-specific sources

1

Scope 1

Direct emissions

2

Scope 2

Purchased energy

3

Scope 3

Value chain

70–80%

4

Scope 4

Avoided emissions

Scope 1: Direct emissions

Scope 1 covers emissions from sources your organisation owns or controls directly. In IT terms, this is relatively narrow: fuel used in backup generators, on-site combustion, and refrigerants in cooling systems. For most organisations operating in modern facilities, Scope 1 is not where the bulk of the IT footprint sits.

Diesel generators On-site combustion Refrigerants (cooling) Emergency backup systems

Why this matters for measurement

Most IT sustainability programmes focus on what they can measure easily, namely Scope 2 electricity. But for organisations that have migrated to cloud and SaaS, Scope 3 is where the bulk of the footprint sits. Any measurement programme that ignores Scope 3 is measuring the tail, not the body.

Scopes set the boundary; data quality decides what the number means

The next section introduces the measured–modelled–proxy hierarchy. In a real estate, Scopes 1 and 2 are usually measured (meters, billing data); large parts of Scope 3 are modelled or proxy. A credible report names the boundary and the data quality on each line, not just the boundary.

Section 3

Measured, modelled, proxy.

Every number in a carbon report sits in one of three categories. Most reports do not say which. That single omission is where most credibility loss starts.

Tier 01 · Measured

Direct telemetry from the system itself

Energy meters on PDUs, server-level power draw, billing data, water meters, asset-level lifecycle records. Real numbers from the real estate. Highest confidence, and the only data that supports per-site or per-workload decisions.

Acceptable for: disclosure, baseline, operational decision.

Tier 02 · Modelled

Calculated from activity data + emission factors

VM-hours × regional grid intensity. Devices × manufacturer lifecycle factors. PUE-adjusted IT load × electricity factor. Reasonable confidence when activity data is genuine and factors are current. Loses fidelity when factors are averaged or out of date.

Acceptable for: disclosure, baseline. Marginal for operational decisions unless inputs are live.

Tier 03 · Proxy

Spend or headcount × industry-average factor

£ on cloud × emission factor per £. Headcount × per-employee carbon. Acceptable as a starting point where nothing else exists, and unavoidable for large parts of Scope 3 today. Cannot tell you anything about utilisation, region, time, or whether the activity was necessary.

Acceptable for: first-pass disclosure only. Misleading if used to steer decisions.

The discipline most reports skip

Mixed estates use all three tiers. That is normal and unavoidable. The discipline is knowing which tier sits behind every line of every report, marking it explicitly, and never using a Tier 3 number to defend a decision that needs Tier 1. Most credibility loss in IT sustainability reporting comes from collapsing the three tiers into a single composite number with no quality flag attached.

A common failure mode

A board paper presents Scope 3 cloud emissions to one decimal place. The underlying number is Tier 3 (spend × an industry-average factor with a confidence range of roughly ±50%. Reporting it as "1.2%" implies precision the data does not support. A confidence band, not a single decimal, is the honest representation. Precision is not credibility.

Section 4

Activity-based vs spend-based measurement.

These are not equally good methods with different trade-offs. Spend-based is acceptable as a starting point. It is weak as an operational tool.

Activity-Based Measurement

Uses real physical drivers: energy in kWh, number of devices, server hours, VM-hours, storage capacity, data transferred. Linked to emission factors by location and time. Closer to operational truth.

Strengths

Reflects actual consumption
Supports operational optimisation
Can distinguish utilisation and location differences
Enables meaningful trend analysis

Requirements

Requires actual data feeds from systems
Consistent definitions across the estate
Higher data collection burden initially

Spend-Based Measurement

Financial expenditure multiplied by average emission factors. Scales well because procurement systems have spend data even where operational systems don't. Acceptable for Scope 3 baseline reporting, weak as an optimisation tool.

Where it works

Scope 3 first-pass baseline (no better data available)
Supplier categories without activity data
Initial portfolio-level sizing

What it hides

Utilisation differences between teams
Geography and carbon intensity differences
Time-of-day and grid condition variations
Whether the workload was necessary at all

Why it matters in practice

If two teams both spend £1m on cloud, spend-based methods report similar emissions. In reality, one could be running efficient workloads at low carbon intensity in Scotland; the other wasting resources on idle infrastructure in a coal-heavy region. The same spend. Dramatically different actual footprint. Only activity-based measurement reveals this, and therefore only activity-based measurement can drive the right operational decision.

Section 5

Carbon intensity is not constant.

The same workload in two different places, or at two different times of day, can have a dramatically different physical footprint. Region selection is one of the most underestimated levers in enterprise GreenOps.

Live GB Grid Carbon Intensity

Loading live data…

—gCO₂/kWh

current GB average

Cleanest
—

Dirtiest
—

Spread
—

Low carbon (<50g)

Medium (50–200g)

High (200–500g)

Very high (>500g)

Source: National Grid ESO Carbon Intensity API (live) and historical grid average data. Generation mixes evolve year on year.

Section 6

Building a baseline that holds up.

A baseline is not a one-shot accounting exercise. It is the reference point every later improvement claim is measured against. Build it directionally, document its weaknesses, and refresh it on a stated cadence.

1

Define the boundary

Which scopes, which sites, which entities, which categories of Scope 3. State what is excluded and why. A baseline that quietly drops cloud or Scope 3 hardware from scope is not a baseline. It is a marketing document.

2

Choose a method per category, not one method for everything

Activity-based where you have telemetry. Modelled where you have activity data and credible factors. Spend-based as a stated proxy where nothing better exists. Mixed methods are normal and honest. A single composite figure across all scopes hides where the data is thin.

3

Attach a confidence level by category

High / medium / low against each line, with a short note on why. This is the single discipline that separates a credible baseline from one that quietly inflates precision. Auditors, regulators, and serious internal stakeholders will all ask. Better to answer it on the front foot.

4

Set a refresh cadence and an upgrade path

Annual minimum for disclosure. Quarterly or live for operational use. State which Tier 3 lines are on a roadmap to Tier 2 or Tier 1 and by when. A baseline without an improvement plan is a snapshot, not a measurement programme.

Directional and honest beats precise and shaky

A baseline that says "Scope 1 and 2 measured to high confidence; Scope 3 cloud modelled to medium; Scope 3 hardware proxy to low; refresh annually with cloud moving to activity-based by Q4" is more credible than a single all-in number quoted to one decimal place. The first invites engagement. The second invites challenge, and rightly so.

Knowledge Check · Module 3 · Q1

Two engineering teams each spend £1 million per year on cloud infrastructure. A spend-based emissions model reports similar footprints for both. Why might this be seriously misleading?

Select an answer to reveal the explanation.

✓ Correct: Option B

Spend-based measurement treats equal expenditure as equal footprint. But £1m spent efficiently in a low-carbon region with high utilisation is not the same as £1m spent on idle infrastructure in a high-carbon region. Spend doesn't tell you utilisation. It doesn't tell you location. It doesn't tell you time-of-day. And it doesn't tell you whether the workload was necessary at all.

This is why spend-based measurement is acceptable as a baseline starting point for Scope 3 reporting, where you have no other data, but is weak as an operational tool. GreenOps requires the physical reality, because that's the only thing you can actually improve.

Knowledge Check · Module 3 · Q2

A cloud architect is choosing between two equivalent regions for a new workload. Region A has a carbon intensity of 20g CO₂/kWh. Region B has 580g CO₂/kWh. Everything else being equal, what is the correct GreenOps framing of this decision?

Select an answer to reveal the explanation.

✓ Correct: Option B

A 29× difference in carbon intensity is not an incremental improvement. It is a transformation. Running an equivalent workload in a low-carbon region versus a high-carbon region can reduce the operational carbon footprint by an order of magnitude before you've changed a single line of code. This is real, measurable, and sits within engineering's remit.

Option D tests a common misconception: certificates are a market-based instrument and don't change the physical grid impact. GreenOps requires physical reality as its reference point.

Knowledge Check · Module 3 · Q3

A CIO presents Scope 3 cloud emissions to the board as "1.2% of total enterprise emissions." The underlying number is spend-based: annual cloud spend multiplied by an industry-average emission factor), with no supplier-specific data. Which critique is correct?

Select an answer to reveal the explanation.

✓ Correct: Option B

Spend-based numbers carry confidence ranges of roughly ±50%, and sometimes wider, because they assume an industry-average factor that bears little relation to a specific provider's energy mix, region, or efficiency. Reporting a Tier 3 proxy figure as "1.2%" tells the board the number is precise to within a tenth of a per cent. It is not. The honest representation is a band ("roughly 1–2%, low confidence, spend-based proxy"), with a stated path to better data.

Option C tempts because it sounds even-handed, but it collapses the difference between Tier 1 measured data (high confidence) and Tier 3 proxy data (low confidence). That difference is the entire point of the data quality hierarchy. Option D over-corrects: spend-based figures are acceptable as first-pass disclosure provided the data quality is named explicitly. The failure here is precision worship, not the use of a proxy.

⏸ Pause & Reflect

Take 5–10 minutes. Write answers down. Specificity matters more than completeness.

1Does your organisation currently have a baseline? If yes, is it activity-based or spend-based? Do you know the confidence level for each category? If no, what is the biggest practical barrier to starting one?

2Is carbon intensity a factor in any of your current cloud architecture or procurement decisions? If not, where would the resistance come from if you tried to make it one?

3Which unit metric is most relevant to your role right now? Does it currently exist in your organisation? If not, what would be needed to produce it?

Module 3: Key Takeaways

◆

Decide what measurement is for before choosing the method.

Disclosure, baseline, and operational decision are three different uses. Each tolerates different data quality. Using a disclosure-grade number to steer architecture decisions is the most common credibility failure.

◆

Scopes set the boundary; data quality decides what the number means.

Scope 3 typically accounts for 70–80% of IT footprint and is the least measured. Boundary alone is not enough. Every line needs a tier label too.

◆

Measured, modelled, proxy.

Three tiers, three confidence levels, three legitimate uses. Mixed estates use all three. The discipline is naming which tier sits behind every line and never collapsing them into a composite figure with no quality flag.

◆

Spend-based is a baseline starting point, not an operational tool.

It hides utilisation, location, time-of-day, and whether workloads were necessary. Two teams with identical spend can differ by an order of magnitude in physical footprint.

◆

Where and when you run work has a dramatic impact.

Norway to Poland is up to 40× difference in carbon intensity. Scotland to London is roughly 6× on the same grid. Region and timing are real engineering levers.

◆

A directional, honest baseline beats a precise, shaky one.

Define boundary, choose method per category, attach confidence levels, set a refresh cadence. Precision is not credibility. Documented honesty is.

You now have the measurement vocabulary: scopes as the boundary, the measured–modelled–proxy hierarchy for data quality, the activity-vs-spend distinction for method choice, carbon intensity as a real lever, and a four-step discipline for building a baseline that holds up.

In Module 4 we move from how the estate is measured to where the impact actually lives, across the seven domains of an enterprise IT estate, from end-user and data centres through cloud, software, hardware, AI, and networks. From M5 onwards, the course turns to operations: rebound effects, AI as a special case, the green software discipline, and the architectural decisions that prevent waste before it materialises.

How Digital Sustainability
Is Measured

Where programmes build credibility, or lose it.

External reporting and regulatory submission

Setting a starting point for improvement

Steering live workloads, procurement, and design

Scopes 1, 2, and 3 in an IT context.

Scope 1: Direct emissions

Scope 2: Purchased energy

Scope 3: Value chain emissions

Scope 4: Avoided emissions (handprint)

Measured, modelled, proxy.

Direct telemetry from the system itself

Calculated from activity data + emission factors

Spend or headcount × industry-average factor

Activity-based vs spend-based measurement.

Carbon intensity is not constant.

Building a baseline that holds up.

Module 3: Key Takeaways

How Digital SustainabilityIs Measured

Where programmes build credibility, or lose it.

External reporting and regulatory submission

Setting a starting point for improvement

Steering live workloads, procurement, and design

Scopes 1, 2, and 3 in an IT context.

Scope 1: Direct emissions

Scope 2: Purchased energy

Scope 3: Value chain emissions

Scope 4: Avoided emissions (handprint)

Measured, modelled, proxy.

Direct telemetry from the system itself

Calculated from activity data + emission factors

Spend or headcount × industry-average factor

Activity-based vs spend-based measurement.

Carbon intensity is not constant.

Building a baseline that holds up.

Module 3: Key Takeaways

How Digital Sustainability
Is Measured