TL;DR: We’re excited to introduce Clarity’s Global Calibration model for nitrogen dioxide (NO₂)! This model improves the accuracy of raw sensor measurements, without the need for a local collocation study or access to regulatory monitors. By applying a machine-learning-based correction to all Clarity Nodes, the global NO₂ model offers a practical and scalable solution for indicative monitoring, especially in regions where traditional reference equipment may not be available. In this post, we’ll walk you through how we built the model, how it performs, and how it can support your air quality initiatives.

Why We Built a Global NO2 Model

Electrochemical nitrogen dioxide (NO₂) sensors are compact, cost-effective tools for ambient air pollution measurement. However, their accuracy can vary with environmental conditions like temperature and humidity, and minor manufacturing differences can cause unit-to-unit variability.

Traditionally, Clarity’s calibration process has involved collocating each Node-S air quality sensor with a regulatory-grade reference monitor for at least one month. While this produces highly accurate, device-specific calibrations, it requires access to regulatory monitors and significant logistical effort—an obstacle in many parts of the world.

To overcome these barriers, we developed the Global NO₂ Calibration Model. It delivers a significant accuracy improvement over uncalibrated sensor readings without requiring a local collocation, enabling quicker and broader deployment of indicative air quality monitoring networks.

Calibration Development

A Diverse and Comprehensive Dataset

Developing a global model requires a large and diverse dataset of measurements. Our partners have collocated hundreds of Clarity Node-S sensors worldwide, resulting in a dataset of over six million hourly measurements. This broad dataset, with locations ranging from the US to Singapore to Australia, provides an incredibly wide range of seasons and environmental conditions that the Node-S is exposed to. 

Here’s a few summary statistics of the dataset: 

  • Number of hourly measurements: 7,000,000+
  • Number of Node-S: 260
  • Number of cities: 37
  • Number of countries: 37
Descriptive statistics of the global dataset of Clarity air quality measurements used to develop our Global NO2 Calibration Model.

Modeling Approach

To improve the accuracy of raw NO₂ sensor readings, we trained a machine learning model using a method called Light Gradient Boosting Machine (LightGBM). LightGBM is a powerful algorithm that builds an ensemble of simple decision trees to make predictions. Each new tree in the sequence focuses on correcting the errors made by the previous trees, which leads to highly accurate results even with complex or nonlinear data relationships.

Figure 1. The LightGBM model used in Clarity’s global NO2 calibration stacks multiple decision trees. Each tree corrects errors made by the previous one, progressively reducing prediction error and improving NO2 measurement accuracy.

The model corrects the raw NO₂ measurements using several inputs, including:

  • Raw NO₂ signal
  • Temperature
  • Relative humidity (including changes over time) 
  • Auxiliary sensor signals

This structure allows the model to capture subtle behaviors and improve the agreement between sensor readings and true ambient NO₂ concentrations.

How the Global NO2 Model Performs

We evaluated the global model’s performance against reference monitors across multiple locations around the world. The model significantly improves agreement with reference measurements compared to uncalibrated sensor data. 

Performance varies by region and environmental conditions, but the model generally reduces bias and improves correlation. The median R2 of the Global Calibration Model increases from 0.33 to 0.56, and the median root mean squared error (RMSE) drops from 11 ppb to 7 ppb over uncalibrated NO2 sensor readings. 

Figure 2. Distribution of R2 and RMSE for uncalibrated NO₂ readings, readings calibrated with the Global Model, and readings calibrated with a local collocation.

In many cases, a local collocation and custom calibration can improve accuracy even further, but the global model offers a strong starting point, especially where collocation isn't feasible.

You can explore individual examples in our Collocation Results Library

What Are the Limitations?

Because the global model is not customized to each individual device, it cannot correct for unit-to-unit variability in sensor response due to manufacturing differences. Additionally, while the model performs well in many scenarios, it may be less accurate in extreme conditions or locations with pollution sources not well represented in the training data. For projects requiring higher accuracy, we continue to recommend a local collocation and the development of a custom calibration model whenever feasible.

Conclusion

Clarity’s global NO₂ calibration model makes it faster and easier to deploy sensor networks in new regions, offering more accurate indicative air quality data with minimal setup. This approach is especially useful for cities, researchers, and organizations operating in areas without access to regulatory monitors. While it doesn’t replace the precision of a site-specific calibration, the global model provides a valuable tool for expanding access to high-quality NO₂ data.

If you’re interested in deploying NO₂ sensors using the global model, or would like to support building a custom calibration, reach out to our team!