PET SUV Harmonization and EARL Accreditation

PET SUV harmonization is the process of constraining PET/CT scanner performance so that standardized uptake values (SUVs) measured for the same lesion are comparable across different scanners, sites, and time points. It is achieved by reconstructing images to meet phantom-based specifications for calibration accuracy and activity recovery — such as those of the EANM Research Ltd (EARL) program — rather than to whatever settings produce the sharpest-looking image.¹²

The problem harmonization solves is deceptively simple. A standardized uptake value is supposed to be a portable, semi-quantitative measure of tracer uptake. But the same patient, imaged on two different scanners on the same day, can produce SUVs that differ by tens of percent — sometimes by more than 100% for small lesions — purely because of differences in reconstruction and calibration.²⁷ If the number is not portable, every clinical decision built on it is on shaky ground.

Introduction

The standardized uptake value is the most widely used quantitative metric in clinical PET, and it is also one of the most fragile. It underpins lesion characterization, response assessment with PERCIST and EORTC criteria, prognostic stratification, and increasingly the patient-selection thresholds used in theranostics. Yet the SUV is exquisitely sensitive to the technical chain that produces it: the dose calibrator, the scanner cross-calibration, the uptake time, the patient's blood glucose, and — critically — the reconstruction algorithm.¹³

Over the past 15 years, reconstruction has changed faster than almost anything else in PET. Point-spread-function (PSF) modeling and time-of-flight (TOF) reconstruction, smaller voxels, and resolution-recovery methods all increase the recovered activity in small structures and push SUVs upward. The result is that a site upgrading its scanner, or a patient referred between sites, can see SUV shifts large enough to flip a treatment-response call — with no change in the underlying disease.²⁵⁶

SUV harmonization programs exist to put a floor and a ceiling on this variability. This guide explains why SUVs drift between systems, how the EARL accreditation program constrains scanner performance with phantom-based recovery-coefficient and calibration specifications, the difference between the EARL1 and EARL2 standards, and the options available to US facilities. DRPS supports quantitative PET programs through its PET/CT and nuclear medicine physics and accreditation support services across Florida, Maryland, Virginia, Washington DC, California, Nevada, Pennsylvania, New York, New Jersey, and Delaware.

Topic Explanation

What is a standardized uptake value?

The SUV normalizes the activity concentration measured in a region of interest to the injected activity and the patient's body size, so that uptake can be compared without reference to the absolute activity administered. The body-weight SUV is defined as:

S U V_{b w} = \frac{C _{im g} ( t )}{A _{inj} / m}

where $C_{im g} (t)$ is the decay-corrected activity concentration in the image (for example in Bq/mL), $A_{inj}$ is the net injected activity decay-corrected to the scan time, and $m$ is the patient's body mass. Variants normalize to lean body mass (SUL) or body surface area instead of body weight.¹³

Every term in that equation is a potential source of error. $A_{inj}$ depends on the dose calibrator and on accurate residual-activity measurement; $C_{im g}$ depends on the scanner calibration and the reconstruction; the decay correction depends on synchronized clocks; and the comparison only holds if the uptake time $t$ is consistent. For more on why uptake timing matters, see our article on PET uptake time, and for the broader metric, PET SUV quantification.

Why SUVs are not automatically comparable

The dominant modern source of inter-scanner SUV variability is reconstruction. PSF and TOF reconstruction recover more of the true activity in small objects, raising SUVmax and SUVpeak; a smoother reconstruction with heavier filtering recovers less. One head-to-head comparison found SUVmax discrepancies between two PET/CT systems as large as 149%, which fell below 10% once a harmonizing filter was applied.⁷ This is not a malfunction — both images are "correct" for their reconstruction settings — but it makes the raw SUVs incomparable.

Harmonization resolves this by requiring that, in addition to the sharp clinical reconstruction, sites generate a second reconstruction (or apply a post-filter) that meets a common recovery specification. The harmonized series is what is used for quantitative comparison across sites and over time.²⁴⁶

Key Technical Principles

Recovery coefficients and the partial-volume effect

The central physical quantity in harmonization is the recovery coefficient (RC) — the ratio of the measured activity concentration to the true (prepared) activity concentration in an object of known size:

R C = \frac{C _{m e a s u r e d}}{C _{t r u e}}

For large, uniform regions, RC approaches 1.0. For small structures comparable to the scanner's spatial resolution, the partial-volume effect spreads the signal and RC falls well below 1.0 — unless resolution-recovery reconstruction pushes it back up (and sometimes above 1.0, an overshoot). Because lesions are small, the RC-versus-size curve is exactly where scanners disagree.²⁸

EARL accreditation pins down this curve. Sites image a NEMA NU2 body phantom whose fillable spheres span several diameters at a known sphere-to-background activity ratio, then compute $R C_{m e an}$ and $R C_{ma x}$ for each sphere. To pass, every sphere's recovery must fall inside the published EARL band. The EARL accreditation summary from the first ~200 systems found that, before corrective action, roughly 30% of mean-recovery and 23% of max-recovery submissions failed the specification, but after feedback essentially all systems could be brought into compliance — regardless of manufacturer.²

Calibration accuracy and cross-calibration

Harmonized recovery is meaningless if the absolute calibration is wrong. The EARL calibration quality-control scan uses a uniform cylinder filled to a known activity concentration; the measured concentration must agree within ±10%. This checks the entire quantitative chain: the dose calibrator, the residual-activity correction, the clocks used for decay correction, and the scanner's own calibration factor. In the EARL dataset, about 5% of calibration submissions exceeded the ±10% bias limit before corrective action, after which sites reached full compliance.²

The dose calibrator and the PET scanner must be cross-calibrated against a common, traceable activity standard. In the US, that traceability runs to NIST-maintained standards, and the cross-calibration is a recurring quality-control task, not a one-time setup.⁹

Worked SUV example

Suppose a lesion shows a decay-corrected activity concentration of $C_{im g} = 20$ kBq/mL, the patient received a net injected activity of $A_{inj} = 350$ MBq (decay-corrected to scan time), and the patient mass is $m = 70$ kg. Treating 1 mL of tissue as ~1 g:

\frac{A _{inj}}{m} = \frac{350 \times 1 0 ^{6} Bq}{70 , 000 g} = 5000 Bq/g

S U V_{b w} = \frac{20 , 000 Bq/g}{5000 Bq/g} = 4.0

Now suppose the lesion is small enough that one scanner's PSF/TOF reconstruction recovers it with $R C_{ma x} = 1.15$ while the harmonized reconstruction recovers it with $R C_{ma x} = 0.85$ . The same lesion would report $S U V_{ma x} \approx 4.6$ on the sharp reconstruction and $S U V_{ma x} \approx 3.4$ on the harmonized one — a 35% difference driven entirely by reconstruction. Harmonization ensures that the value used for cross-site comparison is the controlled one.²⁷

EARL1 versus EARL2 standards

Feature	EARL1 (2010)	EARL2 (announced 2019)
Reconstruction character	Smoother; larger Gaussian post-filter	Sharper; smaller voxels, less filtering
Recovery-coefficient bands	Lower RC bands	Higher RC bands
Typical SUV effect	Reference baseline	SUVs ~20–30% higher
Metabolic active tumor volume	Reference baseline	Smaller MATV (~20% lower)
Total lesion glycolysis	Reference baseline	Approximately unchanged
Interpretation thresholds	Original PERCIST/cutoffs	Must be re-derived for EARL2

The EARL2 specification was developed because newer digital, silicon-photomultiplier scanners can comfortably exceed the old EARL1 recovery bands, and forcing them into EARL1 discards image quality the hardware can deliver. The trade-off is that EARL2 changes the numbers: published work shows SUVs rise by roughly 23–30% and MATV falls by about 22% under EARL2 relative to EARL1, while TLG stays similar.³ A practical bridging method applies an additional Gaussian filter to EARL2 data to reproduce EARL1-compliant values, allowing a site to report both during a transition.³ The key governance lesson is that changing the performance standard requires updating interpretation criteria — otherwise harmonization trades one source of error for another.¹¹

Clinical Impact

Response assessment and prognosis

Quantitative response criteria assume SUV reproducibility. When EARL-compliant SUVs were used to define a prognostic threshold in locally advanced non–small-cell lung cancer, the EARL-harmonized cutoff was an independent predictor of mortality — but applying a non-harmonized threshold to harmonized data (or vice versa), or pooling non-harmonized multicenter data, destroyed the prognostic association.⁵ In other words, the harmonization is not academic bookkeeping; it is what makes the quantitative biomarker work at all.

Reconstruction choice also interacts directly with response criteria. A comparison of a vendor's sharpest clinical reconstruction against the EARL harmonization reconstruction found clinically relevant SUL differences for lesions meeting PERCIST inclusion criteria, with the sharp reconstruction systematically overestimating true values — so the harmonized reconstruction is the appropriate one for PERCIST-based response assessment.⁶

Theranostics and multicenter trials

As theranostic programs expand, PET quantification increasingly drives patient selection and dosimetry decisions, and those programs frequently span multiple sites and scanners. Harmonization is what allows a baseline scan at one center to be compared with a follow-up at another, and what allows pooled trial data to be analyzed coherently.⁴ For the dosimetry side of theranostics, see our article on Lu-177 theranostics dosimetry, and for the performance testing that underlies any quantitative claim, PET/CT NEMA NU-2 performance testing.

Practical Optimization Tips

1. Standardize patient preparation and uptake time

Harmonized hardware cannot rescue inconsistent biology. Standardize fasting, blood glucose limits, injected activity, and especially the uptake time, because FDG uptake continues to rise in tumors well after injection. Consistency here is a prerequisite for any quantitative comparison.¹

2. Maintain a dedicated harmonized reconstruction

Keep the sharp clinical reconstruction for visual reading and maintain a separate, validated harmonized reconstruction (or post-filter) for quantitation. Do not retrofit harmonization onto a single reconstruction tuned for detection.²⁶

3. Cross-calibrate on a schedule

Cross-calibrate the dose calibrator and scanner against a traceable standard on the program's required cadence, and treat any calibration drift beyond the specification as an actionable finding, not a rounding error.²⁹

4. Re-derive thresholds when the standard changes

If the program migrates from EARL1 to EARL2, or changes scanners, re-derive or re-validate interpretation thresholds and document the transition so longitudinal comparisons remain valid.³¹¹

5. Verify recovery with the correct phantom fill

Recovery coefficients are only meaningful if the phantom is filled to the specified sphere-to-background ratio and imaged with the accreditation protocol. Sloppy phantom preparation produces RC values that pass or fail for the wrong reasons.²⁸

Common pitfalls to avoid

Treating SUV as inherently portable. It is portable only when calibration and recovery are controlled.
Reading and quantifying from the same sharp reconstruction. Use a harmonized series for numbers.
Forgetting clocks and decay correction. Unsynchronized clocks bias every SUV.
Migrating standards without updating thresholds. EARL2 numbers are not EARL1 numbers.
One-time calibration. Cross-calibration drifts and must be rechecked.

Regulatory Considerations

SUV harmonization is governed by accreditation specifications and professional guidelines rather than by radiation-protection regulation, but it is no less binding when a facility participates in a trial, network, or referral program that requires it. The relevant frameworks include:

EANM/EARL FDG-PET/CT accreditation specifications (EARL1 and EARL2) — the phantom-based calibration and recovery-coefficient requirements that define harmonized performance.¹²³
EANM procedure guidelines for tumour imaging, version 2.0 — the standardized acquisition, patient-preparation, and quantification methodology that underlies harmonized FDG PET.¹
NEMA NU 2, Performance Measurements of Positron Emission Tomographs — the standard defining the body phantom and measurement methods on which recovery-coefficient testing is built.¹⁰
RSNA QIBA FDG-PET/CT Profile — the US-oriented quantitative imaging biomarker profile specifying precision and conformance requirements for FDG-PET as a biomarker.¹²
ACR PET/CT Accreditation Program — the domestic accreditation pathway covering scanner performance, phantom imaging, and quality control for US facilities.¹³

In the US, the radioactive-material side of a PET program — the cyclotron product, the dose calibrator, and the radiopharmaceutical handling — is regulated under NRC 10 CFR Part 20 and Part 35 or the equivalent Agreement State program (Florida, Maryland, Virginia, California, Nevada, Pennsylvania, New York, and New Jersey administer their own programs; Washington DC and Delaware are regulated directly by the NRC). Harmonization sits alongside that radiation-safety framework, supported by accreditation support and medical physicist consulting.

Frequently Asked Questions (FAQs)

What is PET SUV harmonization?

PET SUV harmonization is the process of constraining PET/CT scanner performance — calibration accuracy and the recovery of activity in objects of different sizes — so that standardized uptake values (SUVs) measured for the same lesion are comparable across different scanners, sites, and time points. It is achieved by reconstructing images to meet defined phantom-based specifications, such as those of the EANM Research Ltd (EARL) program, rather than to whatever settings produce the sharpest-looking image.

What is the EARL accreditation program?

EARL (EANM Research Ltd) operates the EANM FDG-PET/CT accreditation program, established in 2010, that harmonizes quantitative PET performance across sites. Participating sites submit phantom scans — a uniform cylinder for calibration quality control and a NEMA NU2 body phantom for image-quality and recovery-coefficient assessment — and must demonstrate an SUV bias within ±10% and sphere recovery coefficients within a defined bandwidth to obtain and keep accreditation.

Why do SUVs differ between PET/CT scanners?

Modern reconstruction features — point-spread-function (PSF) modeling, time-of-flight (TOF), small voxels, and limited or no post-filtering — increase the recovered activity concentration in small lesions and therefore raise the measured SUV. Older systems and smoother reconstructions recover less. Differences in scanner cross-calibration, the dose calibrator, the clock used for decay correction, uptake time, and patient preparation add further variability. Reported SUVmax differences between systems can exceed 100% before harmonization.

What phantoms does EARL accreditation use?

EARL accreditation uses two phantoms. A uniform cylindrical phantom is used for the calibration quality-control (CalQC) scan, which checks that the measured activity concentration matches the prepared concentration within ±10%. A NEMA NU2 body phantom with fillable spheres of several diameters is used for the image-quality (IQQC) scan, from which mean and maximum recovery coefficients are calculated for each sphere and compared against the EARL specification bands.

What is the difference between EARL1 and EARL2 standards?

EARL1 is the original specification set (from 2010) using a smoother reconstruction with larger Gaussian filtering. EARL2 (announced in 2019) uses sharper reconstructions with smaller voxels and less filtering, raising the recovery-coefficient bands. Studies show EARL2 yields higher SUVs (commonly 20–30% higher), smaller metabolic active tumor volumes, and similar total-lesion-glycolysis values compared with EARL1, which is why interpretation thresholds must be updated when a site moves between standards.

Is EARL accreditation available to US imaging centers?

Yes. EARL is a European program but accredits sites worldwide, and US centers participating in international or industry-sponsored trials commonly pursue EARL compliance. US facilities also have domestic options: the ACR PET/CT accreditation program, the SNMMI Clinical Trials Network phantom and scanner-qualification process, and the RSNA QIBA FDG-PET/CT Profile all support quantitative consistency. The right choice depends on the clinical and research requirements of the program.

How does SUV harmonization affect clinical trials and treatment response?

Quantitative response criteria such as PERCIST and EORTC, and emerging theranostic patient-selection thresholds, assume SUVs are reproducible. Without harmonization, a change in SUV between baseline and follow-up could reflect a scanner or reconstruction difference rather than a real biological change, producing false response or progression calls. Harmonization makes longitudinal and multicenter SUV comparisons defensible, which is essential for trials and for pooled or referred imaging.

Key Takeaways

An SUV is only portable when it is harmonized. Calibration and recovery must be controlled for the number to mean the same thing everywhere.
Reconstruction is the dominant modern source of SUV drift. PSF/TOF and small voxels raise SUVs; smoother reconstructions lower them.
Recovery coefficients pin down the disagreement. EARL constrains $R C_{m e an}$ and $R C_{ma x}$ across sphere sizes with a NEMA NU2 phantom, plus ±10% calibration accuracy.
EARL2 is not EARL1. Migrating standards raises SUVs ~20–30% and shrinks MATV ~22%; thresholds must be re-derived.
Harmonized and sharp reconstructions serve different purposes. Read on the sharp series; quantify on the harmonized series.
US facilities have options. EARL, ACR PET/CT accreditation, SNMMI CTN, and QIBA all support quantitative consistency.

Conclusion

The standardized uptake value carries far more clinical weight than its simple definition suggests, and that weight is only justified when the number is reproducible. Harmonization programs such as EARL make reproducibility achievable by constraining the two things that matter most — absolute calibration and size-dependent recovery — with phantom-based specifications that any modern scanner can meet after appropriate setup.

For a quantitative PET program, the practical commitments are modest but non-negotiable: standardized patient preparation and uptake time, a dedicated harmonized reconstruction maintained alongside the sharp clinical one, scheduled cross-calibration to a traceable standard, and disciplined threshold management whenever the standard or scanner changes. Facilities that make those commitments can trust their SUVs across scanners, across sites, and across time — which is precisely what response assessment, prognosis, and theranostic selection require.

How DRPS Can Help

Diagnostic Radiation Physics Services supports quantitative PET programs through PET/CT and nuclear medicine physics testing, NEMA NU-2 performance evaluation, phantom-based recovery-coefficient and calibration assessment, dose-calibrator cross-calibration review, and accreditation support for ACR PET/CT and harmonization programs. Our board-certified medical physicists help facilities establish and maintain SUV harmonization aligned with EARL, QIBA, and accreditation requirements.

DRPS supports facilities across our service locations, including Florida, Maryland, Virginia, Washington DC, California, Nevada, New York, Pennsylvania, New Jersey, and Delaware. To evaluate or harmonize your PET quantification program, contact our team.

Related Resources

References

Boellaard R, Delgado-Bolton R, Oyen WJG, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42(2):328-354. doi:10.1007/s00259-014-2961-x. PubMed
Kaalep A, Sera T, Oyen W, et al. EANM/EARL FDG-PET/CT accreditation — summary results from the first 200 accredited imaging systems. Eur J Nucl Med Mol Imaging. 2018;45(3):412-422. doi:10.1007/s00259-017-3853-7. PubMed
Kaalep A, Burggraaff CN, Pieplenbosch S, et al. Quantitative implications of the updated EARL 2019 PET-CT performance standards. EJNMMI Phys. 2019;6(1):28. doi:10.1186/s40658-019-0257-8. PubMed
Aide N, Lasnon C, Veit-Haibach P, et al. EANM/EARL harmonization strategies in PET quantification: from daily practice to multicentre oncological studies. Eur J Nucl Med Mol Imaging. 2017;44(Suppl 1):17-31. doi:10.1007/s00259-017-3740-2. PubMed
Houdu B, Lasnon C, Licaj I, et al. Why harmonization is needed when using FDG PET/CT as a prognosticator: demonstration with EARL-compliant SUV as an independent prognostic factor in lung cancer. Eur J Nucl Med Mol Imaging. 2019;46(2):421-428. doi:10.1007/s00259-018-4151-8. PubMed
Devriese J, Beels L, Maes A, Van de Wiele C, Pottel H. Impact of PET reconstruction protocols on quantification of lesions that fulfil the PERCIST lesion inclusion criteria. EJNMMI Phys. 2018;5(1):35. doi:10.1186/s40658-018-0235-6. PubMed
Rubello D, Colletti PM. SUV harmonization between different hybrid PET/CT systems. Clin Nucl Med. 2018;43(11):811-814. doi:10.1097/RLU.0000000000002284. PubMed
Kaalep A, Huisman M, Sera T, Vugts D, Boellaard R. Feasibility of PET/CT system performance harmonisation for quantitative multicentre 89Zr studies. EJNMMI Phys. 2018;5(1):26. doi:10.1186/s40658-018-0226-7. PubMed
National Institute of Standards and Technology. Radioactivity measurements and traceability for nuclear medicine. nist.gov
National Electrical Manufacturers Association. NEMA NU 2: Performance Measurements of Positron Emission Tomographs (PET). nema.org
Boellaard R, Sera T, Kaalep A, et al. Updating PET/CT performance standards and PET/CT interpretation criteria should go hand in hand. EJNMMI Res. 2019;9(1):95. doi:10.1186/s13550-019-0565-y. PubMed
Radiological Society of North America, Quantitative Imaging Biomarkers Alliance. QIBA Profile: FDG-PET/CT as an Imaging Biomarker Measuring Response to Therapy. rsna.org
American College of Radiology. PET/CT Accreditation Program Requirements. acr.org