Managing Data Center Uncertainty Part III — The Utilization Paradox: Scarcity and Waste Inside AI Infrastructure

AI’s energy problem isn’t shortage—it’s misalignment. GPU clusters run at just 60–70% utilization due to data bottlenecks, creating hidden flexibility. With minimal peak curtailment, the grid could integrate ~100 GW of new load. Smarter governance—not more power—is the real solution.

This article is part of the AIxEnergy Series: Managing Data Center Energy Uncertainty, drawn from the author’s full research paper. The complete version is available directly from the author by request at michael.leifman1@gmail.com.

In Part I of this article series, we discussed how AI is driving unprecedented uncertainty in electricity demand, with U.S. data-center use swinging between 325 and 580 TWh by 2028. The issue isn’t forecasting—it’s governance. Without transparency, flexibility, and better incentives, utilities overbuild, costs rise, and emissions lock in. In Part II we of this article series, reviewed how phantom data centers distort U.S. energy planning. Developers overfile interconnection requests, utilities profit from overbuilding, and regulators approve speculative capacity. Misaligned incentives create costly overbuild and fossil lock-in—requiring governance reform, transparency, and accountability.

audio-thumbnail
Article Summary
0:00
/97.697959

This article explores the gap between apparent scarcity and actual inefficiency, arguing that the energy crisis of AI is not one of shortage but of synchronization. Drawing on research from Lawrence Berkeley National Laboratory, Duke University, and detailed technical studies from Google and Microsoft, this article reveals that GPU utilization across AI data centers averages just 60-70 percent. The culprit is a fundamental infrastructure mismatch: unoptimized training deployments can experience 30-50% GPU idle time waiting for data, with data preprocessing consuming up to 65% of epoch time in the worst cases.

The solution is not to build more power but to use power more intelligently. By distinguishing between training and inference workloads and recognizing which facilities possess genuine operational flexibility, the grid can unlock significant capacity. Duke University research demonstrates nearly 100 GW of potential load integration using existing infrastructure if data centers reduce consumption by just 0.5-1% during peak periods, while inference workloads require different mechanisms focused on electrical demand management via storage.


The Architecture of Inefficiency

💡
AI data centers consume enormous power yet routinely run at only 60–70 percent GPU utilization because storage and preprocessing cannot feed data fast enough, creating idle cycles that utilities cannot see and forcing the grid to plan for load that is far less efficient than it appears.

AI data centers are vast ecosystems of silicon and cooling fluid. Inside, compute clusters built to train large language models draw more power per rack than an entire suburban neighborhood. Yet these same clusters often sit partially idle. Unlike legacy industrial facilities, AI systems operate in pulses—bursts of intense computation followed by intervals where GPUs wait for data.

During active training, GPUs operate at high capacity. But these periods are punctuated by delays as storage systems struggle to deliver training data fast enough. Research reveals that unoptimized AI training deployments can experience 30-50% GPU idle time waiting for data, with data preprocessing consuming up to 65% of epoch time in the worst cases, achieving only 60-70% GPU utilization—not because computational demand is lacking, but because data delivery infrastructure cannot keep pace with GPU processing speed.

Major cloud providers acknowledge this reality. Aggregate GPU utilization across hyperscale facilities rarely exceeds this 60-70% range. The remaining capacity goes unused while GPUs wait for preprocessing, network transfers, or storage systems to deliver the next batch of training data. Utilities, however, cannot see this inefficiency. From their perspective, the data center appears to be a continuous load. The grid must plan for peak capacity as though it were constant.


The I/O Bottleneck: Where Billions in Compute Power Stalls

💡
Data-starvation research shows that AI training is often bottlenecked by slow storage and preprocessing—not computation—leaving costly GPUs idle for 30–65 percent of training time and pushing real-world utilization far below theoretical capacity.

The technical reality beneath these utilization figures reveals why even expensive AI infrastructure operates far below its theoretical capacity. Studies from Google and Microsoft document that model training time is often dominated not by computation but by data delivery—a phenomenon researchers call “data starvation.”

A 2021 analysis of deep neural network training at scale found that workloads spend up to 65% of each training epoch simply on data preprocessing—loading training examples, applying transformations, and shuffling batches—before GPUs can perform any calculations. This preprocessing burden creates systematic bottlenecks that cascade through the entire training pipeline.

The causes are structural. Storage throughput limitations mean GPUs process data orders of magnitude faster than storage systems can deliver it. Large language model training requires sustained read performance exceeding 10 GB/s,
while computer vision training demands even higher throughput. Traditional storage infrastructure often cannot meet these requirements, creating systematic delays where expensive GPUs—costing $30,000-40,000 per H100 unit—sit
idle waiting for data to arrive. Geographic separation compounds the problem. Cloud training has become standard practice, but distributed architectures mean slow I/O when computation and storage span different regions. A Google study documented that training workloads spend on average 30% of total training time simply on input data pipeline operations—time where GPUs contribute nothing to model improvement but continue consuming power.

CPU preprocessing bottlenecks create additional delays. During training, CPUs handle data augmentation, normalization, and batch preparation. When CPUs reach maximum utilization preparing the next batch, GPUs sit partially idle waiting for work. The random access patterns required for training deep neural networks using Stochastic Gradient Descent create particularly punishing demands on storage infrastructure. Systems must read terabytes of data
just to use gigabytes, while expensive GPUs idle. The economic implications are substantial. Poorly optimized data pipelines can reduce GPU utilization to 40-60%, directly impacting both training speed and infrastructure return on investment. Organizations with optimized data loading achieve 90%+ GPU utilization during training—but this requires significant engineering investment in custom pipelines, distributed caching strategies, and infrastructure co-design that most organizations defer.


The I/O Bottleneck as Flexibility Indicator

💡
The I/O bottleneck exposes which AI workloads are truly flexible, showing that most training jobs can shift in time while real-time inference cannot, and without transparency about this distinction utilities misclassify all data-center load as firm—driving costly overbuild that smarter, workload-aware planning could avoid.

This technical reality has profound implications for grid planning. The I/O bottleneck reveals which data center workloads possess inherent temporal flexibility. A training job that requires 48 hours due to data delivery constraints
is fundamentally different from latency-sensitive inference serving real-time user requests.

The distinction is not merely technical but economic: if a training workload were genuinely time-critical, the organization would invest the significant engineering resources required to optimize data pipelines and achieve 90%+GPU utilization. The fact that many organizations accept 60-70% utilization indicates these workloads can tolerate substantial delays. Deferring a training job’s start by a few hours to avoid grid peaks adds modest calendar time to multi-day runs already experiencing delays from I/O constraints. Organizations that accept multi-day training times are demonstrating revealed preference: these workloads are not latency-sensitive. The I/O bottleneck doesn’t prevent flexibility; it identifies which workloads can provide it most readily.

Duke University research demonstrates the scale of this opportunity. Their analysis found that nearly 100 GW of new large loads could be integrated across the 22 largest U.S. balancing authorities with minimal impact if data centers can
reduce power usage by 0.5-1% during peak demand periods. For the 0.5% curtailment rate, averaging about 2.1 hours per event over 177 hours per year, the existing grid could handle much of expected demand growth with no additional
generation. Importantly, on average for 88% of curtailment time, half the new load would continue running—meaning partial rather than complete shutdowns.

However, this flexibility opportunity applies primarily to training workloads. For inference operations serving real- time user requests, achieving grid flexibility requires electrical demand management via battery storage that maintains
computational continuity while reducing peak electrical draw, rather than direct computational curtailment that would violate service requirements.

The problem is institutional. Workloads remain opaque to regulators, and utilities treat all megawatts as firm. Without reliable data on training versus inference workload composition, utilities cannot design targeted demand response
programs and regulators cannot assess whether proposed facilities can provide grid services. The Cost of Misclassification

Utilities treat AI data centers as steady, predictable, and inflexible. But AI is dynamic, algorithmic, and constantly evolving. By classifying data centers as firm industrial load, regulators force utilities to build for a world that no longer
exists. This misclassification has profound consequences. Utilities commit to constructing generation and transmission assets assuming 24/7 operation at design capacity, even though real-world utilization rarely justifies it. In Virginia, utility forecasts incorporated speculative AI facilities at near-continuous utilization. Half lacked financing or operational schedules. Yet those assumptions justified billions in new gas-fired generation. The same story repeats across Georgia, Texas, and Oregon.

A more rational classification would tie load designations to computational behavior—training versus inference, flexible versus firm. This would align planning with technical realities and create incentives for data centers to disclose
operational flexibility. Transparency is the foundation of efficiency.


From Waste to Value

💡
In the AI era, grid reliability hinges on how efficiently power is used, not how much is built, demanding transparency on GPU utilization, performance-based incentives that reward reduced waste, and flexibility markets tailored to the distinct behaviors of training and inference workloads.

For a century, the energy sector equated reliability with capacity. In the AI age, reliability will depend on utilization—on how effectively each watt converts into useful computation. To make this transition real, three governance mechanisms stand out:

First, utilization disclosure. Require all AI facilities above 50 MW to report GPU utilization, workload composition (training versus inference percentages), and flexibility potential during interconnection approval. Regulators can then
distinguish between continuous and shiftable load.

Second, performance-based regulation. Extend PBR frameworks to reward utilities for verified reductions in system waste—deferring capacity expansions through coordinated management of data center loads where technically feasible. This converts efficiency into a profit center rather than a cost.

Third, differentiated flexibility markets. Create demand response mechanisms that reward operators for aligning load timing with grid conditions where technically feasible (primarily for training workloads), while recognizing that inference workloads require different approaches such as electrical storage.


The Governance Imperative

💡
The grid’s real challenge is not excess demand but opaque, inefficient AI workloads, and true resilience will come from transparency and governance that reward efficiency, distinguish workload types, and unlock the vast flexibility already hidden inside today’s underused GPUs.

The utilization paradox exposes a simple truth: the grid’s greatest risk is not overuse, but misuse combined with opacity. Energy institutions struggle to govern a world lacking the data needed to distinguish between workload types. Regulators equate prudence with overbuild, and utilities profit from expansion. Yet the path to resilience lies in visibility, accuracy, and coordination.

To govern the AI age, policymakers must reward efficiency per watt rather than megawatts built. They must treat transparency as infrastructure and recognize that different workload types require different flexibility mechanisms.

Training workloads offer temporal flexibility potential; inference workloads require electrical flexibility via storage. Both contribute to grid resilience through fundamentally different mechanisms. The takeaway is clear: the cleanest power plant is the unused GPU. Unlocking that efficiency requires not more technology, but more transparency and governance that distinguish between workload types and create appropriate economic incentives for each.


What’s Next


Part III has revealed that what looks like scarcity is, in many cases, inefficiency masked by opacity—a failure of synchronization between computation and electricity, compounded by lack of data on workload types. Part IV will
detail the specific policy mechanisms that can transform this inefficiency into value. By showing how targeted interventions can unlock grid headroom while respecting different workload characteristics, it will argue that the next frontier of reliability is not generation—it is intelligence paired with transparency.

References

Lawrence Berkeley National Laboratory. 2024 United States Data Center Energy Usage Report. Berkeley, CA: LBNL, 2024. LBNL-2001637.

Norris, T. H., D. Patiño-Echeverri, and M. Dworkin. Rethinking Load Growth: Assessing the Potential for Integration of Large Flexible Loads in U.S. Power Systems. Durham, NC: Duke University Nicholas Institute, 2025.

Mohan, Jayashree, et al. “Analyzing and Mitigating Data Stalls in DNN Training.” Proceedings of the VLDB Endowment 14 (2021). Accessed [date]. https://www.cs.utexas.edu/~vijay/papers/vldb21-datastalls.pdf.

Spjut, Josef, and Ted Purcell. “The New Bottlenecks of ML Training: A Storage Perspective.” SIGARCH Blog, July 2021. Accessed [date]. https://www.sigarch.org/the-new-bottlenecks-of-ml-training-a-storage-perspective/.

Leifman, Michael. Managing Data Center Energy Uncertainty: A Framework to Prevent Overbuild, Control Costs, and Unlock Grid Flexibility. Washington, DC: AIxEnergy, 2025. Available from the author at michael.leifman1@gmail.com.