
The rise of generative AI has created a new kind of infrastructure challenge. Behind every intelligent assistant, summarization tool, or creative application lies a model that demands significant energy to train and even more to run at scale. As usage expands from early adopters to enterprise-wide deployments, the environmental cost of these models becomes harder to ignore.
Reducing the carbon footprint of AI now extends beyond optimizing training cycles; it also means minimizing the energy used by AI systems during everyday operation. The majority of emissions now come from the constant, distributed process of running models in real-time across millions of devices and servers. However, with smarter infrastructure choices, organizations can lower that impact without compromising performance.
The Emissions Story Behind Generative Models
It’s easy to assume the bulk of AI’s energy use happens during model training. When OpenAI trained GPT-3 in 2020, it consumed an estimated 1,287 megawatt-hours of electricity and produced over 500 metric tons of CO2. But once models are deployed, inference quickly becomes the dominant source of emissions, particularly for models integrated into customer-facing products.
In a 2022 study, researchers from Hugging Face and the University of Cambridge found that inference can represent 80 to 90 percent of a model’s total energy usage over its lifecycle. The more popular and responsive a model is, the higher its ongoing environmental cost.
This reality has prompted leading tech companies to rethink how and where these models run. Rather than focus only on algorithmic efficiency or pruning parameters, many are now exploring how to embed carbon awareness directly into the infrastructure stack.
Optimizing Location and Cooling
The carbon intensity of electricity varies dramatically across regions. In countries like Norway and Sweden, electricity is largely drawn from hydropower and wind, resulting in relatively clean grids. By contrast, data centers in parts of the U.S. Midwest or Poland may rely heavily on coal or gas, making them far more carbon-intensive.
Workload placement has become a powerful tool for lowering emissions. By routing compute to cleaner regions, companies can immediately reduce the carbon impact of their AI operations. Google’s internal research found that moving a workload between two U.S. regions could result 3x improvement in the CCI of our TPU chips over 4 years.
In tandem with location, cooling remains one of the largest non-compute energy expenses in data centers. Traditional air cooling methods are increasingly inefficient for high-density GPU clusters. Alternatives like liquid immersion cooling or using naturally cold air in northern regions are gaining traction. The Uptime Institute reports that advanced cooling systems can reduce energy overhead by 20 to 30 percent, especially in large-scale AI deployments.
These optimizations don’t require entirely new data centers. In many cases, shifting workloads to underutilized facilities in cleaner grids, or retrofitting existing ones with more efficient cooling can deliver immediate gains.
Shifting When, Not Just Where
While infrastructure location is key, timing also plays a role. Some electricity grids fluctuate in carbon intensity throughout the day based on renewable availability. For example, California’s grid is cleanest during midday when solar output peaks, while in countries like Denmark, wind production spikes overnight.
By aligning non-urgent AI tasks, such as retraining models, batch inference, or synthetic data generation, with these clean energy windows, organizations can take advantage of lower-carbon power without relocating. This is referred to as temporal load shifting and is especially effective for workloads that aren’t latency-sensitive.
Cloud providers are beginning to offer APIs and dashboards that show real-time carbon intensity data by region and hour. With this information, engineering teams can build systems that dynamically schedule tasks for low-carbon windows. Although it’s not yet a common practice, early adopters are proving that it’s both feasible and cost-effective.
For instance, to reduce environmental impact and improve performance, we’ve started scheduling certain customer-run asynchronous tasks in regions with a lower carbon footprint. At the same time, we’re encouraging customers to shift toward a job-based model for long-running workloads, which allows for more efficient placement across our infrastructure and helps reduce unnecessary client-side load.
Bringing Carbon Into the Conversation
Many infrastructure teams already track metrics like latency, cost per inference, and uptime. Adding carbon data into this equation doesn’t need to be disruptive. In fact, making it a visible, measurable metric often surfaces unexpected opportunities for optimization.
For example, platforms like WattTime and Electricity Maps provide APIs that allow developers to estimate the carbon impact of specific jobs or workloads. When paired with internal observability tools, teams can track energy usage by model, region, and time. This visibility enables more informed decisions such as selecting models with lower compute demands for certain use cases or fine-tuning deployment zones based on grid conditions.
Industry-wide efforts are beginning to formalize this thinking. The Green Software Foundation has proposed a set of carbon-aware design principles. Conferences such as NeurIPS and ICLR have begun requesting emissions disclosures for papers on model training. And cloud providers are gradually adding sustainability dashboards to their core platforms.
Still, there are missing pieces in today’s toolchain. One of the most significant is carbon traceability, especially when considering agents that involve thousands of services and interdependent calls to models. Being able to identify which part of each inference emits carbon is crucial for optimizing the infrastructure.
The environmental cost of AI is not theoretical. As generative models become embedded in everyday tools and workflows, the scale of compute and, therefore, emissions continues to rise. But infrastructure is a lever we can actually control.
By placing workloads in cleaner regions, using more efficient cooling methods, and aligning non-critical compute with periods of renewable abundance, companies can meaningfully reduce emissions without sacrificing performance. The shift to carbon-aware AI is quickly becoming a mark of engineering maturity and long-term viability.