Home / Infrastructure & Technology / How Should Utilities Plan for AI’s Surging, Shifting Load?

How Should Utilities Plan for AI’s Surging, Shifting Load?

Apr 27, 2026 Interview

Emilia HerrairaEnergy Solutions Consultant

Christopher Hailstone brings decades of hands-on experience in energy management, renewables, and the operational realities of electricity delivery. He has sat on both sides of the table—advising utilities on grid reliability and helping developers prove out their designs—so he has a visceral feel for where assumptions break and where better data can unlock smarter, faster decisions. In this conversation with Emilia Herraira, he explains how AI data centers are changing not just the scale of electricity demand but its character, and why physics-based modeling, staged interconnection, and defensible operational profiles will define who connects quickly and who waits in line. Together, they explore the shared responsibility of developers, utilities, and policymakers to evolve planning from peak-centric thinking to a richer, time-aware understanding of variability, ramping, and operational behavior across a full year of real conditions.

AI data centers may reach 9–17% of U.S. electricity by 2030, up from 3–4% today. What planning assumptions should change first, and why? Which metrics beyond peak kW matter most, and can you share examples of models that improved decisions?

The first assumption to retire is that a single “peak kW” tells you enough to plan a substation, a feeder, or a resource stack. When demand could rise from roughly 3–4% of total electricity to 9–17% within a few planning cycles, you need a language that captures time, shape, and surprise. I push teams to elevate a core quartet: hourly variability, ramp rate, diversity across workloads, and the coupling between IT and cooling. In practice, we’ve replaced static profiles with full-year simulations that stitch together electrical, thermal, and airflow models; by doing that, one utility avoided oversizing a transformer bank after seeing that the annual profile’s true worst hour was driven by a cooling control transition, not the nameplate IT peak. Another project used the same modeling to quantify how a change in workload scheduling flattened a 40–50% swing during training runs into a gentler cadence utilities could absorb, which materially improved interconnection approval odds. The lesson is simple: once you show how the load breathes over 8,760 hours, smarter, cheaper decisions emerge.

High-density compute can swing 40–50% in short periods. How should utilities translate that volatility into ramp-rate limits, contingency reserves, and protection settings? What diagnostics or telemetry have you found essential to validate those profiles in operation?

I start by negotiating explicit ramp envelopes tied to operating modes—training, inference, maintenance—so operators know when a 40–50% swing is plausible and how it will be bounded. Those envelopes become inputs to contingency reserves and short-term re-dispatch, with protective relays and feeder settings tuned to avoid nuisance trips during legitimate workload shifts. You don’t guess your way into this; you validate it with high-speed power quality metering at the point of interconnection, synchronized telemetry from UPS and cooling drives, and time-aligned IT workload logs so you can correlate the electrical story to the computational one. In the field, that stack of diagnostics uncovered a cooling control interaction that doubled an apparent ramp; after retuning, the facility met the agreed ramp envelope and the utility reduced conservative reserves that had been holding back other connections. The operational calm that follows is palpable—fewer alarms, steadier voltage, and a control room that isn’t bracing for the next spike.

Single projects often request 100–500 MW, with multi-phase sites targeting gigawatt-scale demand. How do you stage interconnection capacity across phases, and what milestones trigger each tranche? Can you share a case where staging avoided stranded assets?

We stage capacity in discrete tranches that mirror credible construction and commissioning sequences, each gated by milestones like completed shells, energized feeders, verified cooling readiness, and demonstrated load performance under defined profiles. For a campus targeting gigawatt-scale over multiple years, the first tranche served a 100–500 MW block, with the next tranche contingent on achieving verified utilization and adherence to ramp limits under summer and shoulder conditions. That discipline prevented the utility from installing upstream capacity that would have sat idle; when a shift in market priorities slowed the second phase, the right-sized first tranche kept assets earning and customers online. The avoided stranded cost was not just financial—it preserved optionality when the developer introduced a new workload mix that changed the timing and character of demand.

Accelerated connection timelines are common. What are the practical steps to compress studies, upgrades, and commissioning without compromising reliability? Which tasks can run in parallel, and where do delays most often occur?

Parallelization starts with shared, standardized data packages so power flow, protection, and thermal studies kick off together instead of in sequence. On the utility side, we bundle procurement with preliminary engineering and use conditional approvals tied to operational guardrails, while the developer runs factory acceptance tests that pre-qualify control logic and telemetry before steel hits the ground. Commissioning accelerates when you rehearse the first-year operating states in a digital environment, then execute site tests that mirror those stress points instead of generic checklists. Delays most often trace back to missing load-shape evidence, late protection coordination updates after design changes, and misaligned controls between IT and cooling. When you fix the data handoffs and lock the test plan to the real profile, weeks melt off the schedule without eroding reliability margins.

Interconnection queues in some regions exceed 2–3× current peak demand. How should prioritization criteria evolve for large loads, and what data should applicants provide upfront? Can you describe a queue-management practice that materially shortened cycle times?

Prioritization should reward readiness and transparency: complete demand profiles over a full year, demonstrated adherence to ramp envelopes, and clear phasing tied to realistic construction milestones. Applicants should submit standardized workload descriptions, cooling sequences across weather bands, and commitments to operational behavior, not just a maximum kW. One practical queue approach that cut cycle times was a milestone-based lane where projects with validated simulations and enforceable operating limits advanced faster; those that couldn’t meet the bar stayed in a general lane without clogging the path. By aligning queue position with the quality of data and accountability, the region moved high-confidence 100–500 MW projects sooner, which relieved pressure on both planning staff and the grid.

Overestimating load risks overbuild; underestimating risks congestion and retrofits. How do you bound uncertainty with confidence intervals, scenario trees, or option-value thinking? What thresholds or triggers prompt a redesign versus temporary operating limits?

We bracket uncertainty with scenario trees anchored to realistic workload mixes, then wrap each branch in confidence intervals derived from operational analogs and full-year simulations. Option-value thinking shows up in staged infrastructure and modular cooling, so we “buy the right to expand” rather than paying for capacity we don’t use. Triggers for redesign include persistent deviations from the agreed 40–50% ramp envelope or sustained divergence between modeled and measured annual profiles that threaten upstream stability. Temporary operating limits—like stricter ramp caps or seasonal curtailment—make sense when deviations are bounded and fixable, but once they erode reliability or stall adjacent connections, it’s time to change the design, not just the playbook.

Developers are being asked for defensible demand profiles across real conditions. What inputs—IT workloads, cooling sequences, weather, utility constraints—must be standardized? How do you validate those inputs with pilots, digital twins, or measured baselines?

Standardization starts with a taxonomy of IT workloads that differentiates steady inference from bursty training, mapped to cooling sequences that respond predictably across weather. We pair that with normalized weather years and explicit utility constraints—such as permissible ramp windows and voltage bounds—so the model speaks the same language as the grid. Validation begins with small pilots that mimic the most demanding operating states, then a digital twin that runs the 8,760-hour profile using physics-based electrical, thermal, and airflow models. Finally, we reconcile the twin with measured baselines in the first months of operation, tightening parameters until the modeled and observed curves align within acceptable tolerance. That creates a living profile you can stand behind in front of a planner or a regulator.

Physics-based simulation is gaining traction. Which models (electrical, thermal, airflow, grid-transient) matter most at each project stage, and how do you calibrate them? Can you share metrics showing simulation-to-reality accuracy and resulting cost or schedule savings?

Early siting leans on grid-transient and power flow to flag weak nodes; mid-design shifts to electrical selectivity, thermal plant dynamics, and airflow to capture the coupling that drives peaks and ramps. As you near commissioning, you integrate them into a single year-long simulator that tests seasonal behavior, part-load cooling, and free-cooling sequences against the actual workload cadence. Calibration is iterative: we tune against factory tests, pilot runs, and the first operational weeks until the simulated annual profile tracks the measured one in the hours that matter most—those with steep ramps or tight voltage. In one project, that fidelity let the utility accept a staged interconnection sooner and avoid over-installing capacity for the first phase, shaving months off the schedule and sidestepping stranded assets as the next tranche came online.

Cooling and IT interact tightly. How do you capture workload-driven heat flux, partial-load chiller performance, and free-cooling availability in annual profiles? What control strategies reduced peaks or ramps, and by how much?

We start with heat flux derived from the workload taxonomy, then pass it through chiller and pump curves that actually reflect partial-load behavior, not brochure points. Free-cooling hours are modeled with the same weather year driving the electrical side so you see when the plant can ride on ambient conditions and when it must lean on compressors. Control strategies that have worked include coordinating workload pacing with cooling setpoint glide to blunt the 40–50% swings during training bursts; that alignment turned ragged, spiky profiles into smoother steps the grid could handle. The human experience of this is real: fans stop howling, valve positions stop hunting, and operators breathe easier as alarms quiet down during the very hours that used to be the most stressful.

Reliability standards face new stressors from fast-changing loads. How should planning reserves, voltage control, and protection coordination adapt near large campuses? What field measurements or event analyses have reshaped your guidelines?

We recommend dynamic planning reserves that explicitly account for high-variance load segments rather than generic percentages, and local voltage control that pairs fast-responding devices with clear ramp envelopes. Protection must be rethought so legitimate workload transitions don’t masquerade as faults; that means settings and logic that recognize permitted ramps and the behavior of modern power electronics. Field data from events where training runs drove rapid reactive power shifts led us to refine voltage regulation setpoints and coordination zones around the campus, improving ride-through without sacrificing safety. Those analyses changed our guidelines from static thresholds to context-aware bands tied to documented operating states.

Policymakers are weighing cost allocation so large new loads fund a fair share of upgrades. What framing leads to equitable tariffs or network charges, and how do you guard against unintended ratepayer impacts? Any examples where incentives aligned behavior effectively?

The fairest framing connects charges to the cost drivers: not just peak kW, but variability, ramping, and the timing of demand relative to system stress. Tariffs that reflect those dimensions encourage designs and operations that lower real system costs, while guardrails—like caps tied to verified profiles—prevent windfalls or unintended burdens on other customers. Where incentives aligned behavior well, large sites earned more favorable terms by demonstrating full-year profiles, committing to ramp envelopes, and accepting staged capacity that matched proven utilization. The result was a cleaner cost signal, fewer stranded upgrades, and a healthier social compact with ratepayers who could see that growth was being managed, not subsidized in the dark.

Some regions require evidence of operational behavior, not just peak declarations. What performance guarantees, telemetry sharing, or curtailment agreements build trust? How do you structure enforceable service levels without stifling growth?

Trust grows when declarations become commitments with teeth: performance guarantees around ramp envelopes, telemetry that streams real-time load, and curtailment agreements that trigger under defined system conditions. We pair those with transparent reporting and periodic audits so operators can compare the promised 8,760-hour profile to lived reality. Enforceable service levels don’t have to be punitive; they can scale with phased capacity, easing as the site demonstrates good citizenship and tightening if deviations persist. That balanced approach gives planners confidence to connect the next 100–500 MW tranche while preserving the developer’s ability to grow responsibly.

What practical demand-flex options fit AI facilities—batch scheduling, thermal storage, UPS grid services, or on-site generation—and what are realistic MW and duration ranges? How do you prove deliverability to a system operator?

The most natural flex starts with batch scheduling to shave and shift the load shape, then adds thermal strategies—like pre-cooling and sensible storage—to decouple heat rejection from immediate electric demand. UPS and power electronics can provide short-duration support, while on-site generation and tighter workload orchestration cover longer system stress windows. To prove deliverability, we embed these options into the same physics-based annual model, test them in pilots, and then demonstrate performance against the operator’s telemetry and dispatch signals. The operator doesn’t want rhetoric; they want to see that when the system calls, the profile changes as promised and stays within the agreed bounds.

If you could change three planning artifacts today—load forecast templates, interconnection studies, or resource adequacy rules—what would you rewrite, and how? What near-term pilot would demonstrate the payoff within 12 months?

I’d rewrite load forecast templates to require full-year, operating-state-aware profiles tied to standardized workload and cooling taxonomies. Interconnection studies would move from static extremes to scenario-based assessments that test 8,760-hour behavior, with staged approvals linked to verified milestones. Resource adequacy rules would incorporate variability and ramping explicitly, so planning reserves reflect the character of demand rather than a single peak snapshot. For a fast pilot, pick a cluster of candidate sites, run physics-based annual studies with enforceable operating envelopes, and then measure first-year performance against those studies. Within 12 months, you’ll see shorter cycle times, fewer redesigns, and a clearer path to staging capacity without overbuild.

What is your forecast for AI-driven data center demand and grid planning over the next five years, and which leading indicators—queue composition, ramp-rate incidents, tariff adoption—will tell us if we’re on track?

Over the next five years, I expect data center electricity demand to at least double, possibly push toward tripling, with more regions feeling the pressure that a few hotspots experience today. The facilities won’t just be bigger; they’ll be more dynamic, with workload mixes that make the 40–50% swing a planning norm, not an outlier. We’ll know we’re on track if the share of applications with full-year, physics-based profiles rises, if interconnection queues—now running 2–3× current peak demand in some areas—start moving projects with staged capacity and enforceable operating limits to the front, and if tariff designs begin rewarding variability and ramp control rather than raw peak alone. A second set of signals will come from the field: fewer ramp-rate incidents during stressful hours, more sites meeting staged milestones for 100–500 MW tranches, and regulators publicly tying approvals to defensible operational behavior. If those indicators move in the right direction, we’ll blend growth with reliability; if they don’t, we’ll see congestion, retrofits, and fraying trust long before the decade closes.

How Should Utilities Plan for AI’s Surging, Shifting Load?

Related Publications

Subscribe to our weekly news digest.