Air Quality Science

How Physics-Based AI
Outperforms Pure Data Models

A deep dive into why constraining machine learning with atmospheric physics produces dramatically better wildfire smoke forecasts.

April 10, 2026 8 min read Dr. Sarah Chen, Chief Scientist
Article thumbnail

When wildfire smoke threatens a city, every hour of advance warning matters. Hospitals need time to prepare for increased respiratory admissions. Schools need time to cancel outdoor activities. Emergency managers need time to issue public advisories. That window of advance warning is precisely what air quality forecasting systems must deliver — and it turns out that not all forecasting approaches are created equal.

At Trace AQ, we've spent the past three years developing and validating a physics-constrained AI forecasting architecture. Our findings, which we're sharing in this article, demonstrate that models anchored in atmospheric physics consistently outperform pure machine learning approaches when predicting PM2.5 concentrations during wildfire smoke events.

The Problem with Pure Data Models

Modern machine learning models trained on historical air quality data can identify complex patterns that statistical models miss. They learn relationships between wind speed, humidity, temperature inversions, and pollutant concentrations without being explicitly programmed with physical equations. This flexibility is a genuine strength — until conditions fall outside the training distribution.

Wildfire smoke events are, by their nature, relatively rare and highly variable. A fire burning in a specific fuel type at a specific elevation under specific wind conditions produces a unique atmospheric fingerprint. No matter how large your training dataset, the exact combination of factors during any given event is likely underrepresented in historical data. Pure data-driven models struggle when conditions diverge from what they've seen before.

We tested this directly. We trained a state-of-the-art gradient boosting model on five years of PM2.5 measurements, meteorological data, and fire radiative power from satellite imagery across the western United States. During routine days, the model performed admirably — mean absolute error of 4.2 µg/m³ at 24-hour lead times. But during active wildfire smoke intrusion events, error jumped to 23.7 µg/m³. For context, the difference between "moderate" and "unhealthy" AQI is roughly 20 µg/m³ in PM2.5 terms. These errors are not acceptable for public health applications.

What Physics Constraints Add

The Trace AQ architecture layers machine learning on top of a physics core. We use the Community Multiscale Air Quality (CMAQ) model — the same atmospheric chemistry and transport model used by the EPA — to generate a baseline forecast. Our AI system then applies learned corrections to that physics-based forecast rather than predicting concentrations from scratch.

This approach, sometimes called "hybrid" or "physics-informed" machine learning, has several important advantages. First, the physics model encodes conservation laws and chemical reaction pathways that data alone cannot reliably learn. Smoke particles don't teleport — they follow atmospheric transport equations. Second, the physics model provides a reasonable baseline even in completely novel conditions. The AI's job is to correct known biases and sharpen spatial resolution, not to bear the full forecasting burden.

The result: our physics-constrained model achieved a mean absolute error of 6.1 µg/m³ during the same wildfire events where the pure data model reached 23.7 µg/m³. That's a 74% error reduction on the events that matter most.

Satellite Data as a Real-Time Constraint

A critical component of our architecture is near-real-time satellite data assimilation. The GOES-R ABI (Advanced Baseline Imager) produces imagery every 5–10 minutes over the continental United States. We ingest smoke optical depth retrievals from this imagery and use them to continuously correct the position, density, and vertical extent of modeled smoke plumes.

This correction loop — physics model forecast, satellite observation, data assimilation update — means our model is not flying blind during fast-moving smoke events. When a fire quadruples in size in a matter of hours, our forecast adjusts within minutes of the satellite detecting the increased fire radiative power.

Verification Against Independent Observations

We validated our approach against PurpleAir sensor data — a network of over 30,000 low-cost air quality sensors maintained by the public across North America. Importantly, these sensors were not used in model training or calibration, making them a genuinely independent verification source.

Across 847 wildfire smoke events between 2022 and 2025, our physics-constrained model outperformed the pure data model in 91% of cases at 24-hour lead times and 87% of cases at 48-hour lead times. The advantage narrowed at 72-hour lead times, which is expected — at longer ranges, uncertainty in the fire behavior forecast itself becomes the dominant error source, not the atmospheric transport model.

Implications for Environmental AI

The lessons from air quality forecasting extend to other environmental prediction challenges. Climate downscaling, flood inundation modeling, and agricultural yield forecasting all involve physical systems with well-characterized governing equations. In each domain, the question is not "should we use AI or physics?" but "how do we best combine them?"

Our experience suggests several principles. First, physical constraints are most valuable during out-of-distribution events — precisely when accurate forecasts are most critical. Second, the architecture should allow the AI component to learn corrections to the physics model, not override it. Third, real-time observational data assimilation can substantially outweigh additional model complexity when events are evolving rapidly.

As climate change continues to intensify wildfire seasons and expand their geographic range, the demand for accurate smoke forecasting will only grow. We're committed to making this technology accessible to healthcare providers, researchers, and city governments who need it most — not just the well-resourced federal agencies that have historically had exclusive access to these tools.

If you'd like to explore the Trace AQ API for your organization's air quality monitoring needs, we'd welcome a conversation. Reach out through our contact page or request a demo to see the platform in action.

Tags: Wildfire Smoke Machine Learning PM2.5
Related Articles

Continue Reading