Phaidra, CoreWeave and Applied Digital pioneer NVIDIA Max-Q AI factories with agentic liquid cooling management

Cookie Settings

We use cookies to operate this website, improve usability, personalize your experience and improve our marketing. Your privacy is important to us. Privacy Policy.

March 16, 2026 – Phaidra today announced a groundbreaking methodology to drastically improve the thermal stability of liquid-cooled AI data centers. This methodology is outlined in the joint white paper “AI Agents for Liquid-Cooled AI Factories.”

By successfully leveraging AI-driven, feed-forward control systems on production NVIDIA Grace Blackwell platforms, the collaboration is paving the way for the future of "DSX AI factories" — a new operational paradigm where power, cooling, and workload management are unified to maximize efficiency and computational throughput. Phaidra has integrated NVIDIA DSX Max-Q to run GPU clusters as efficiently as possible, so more of the available power can go towards running AI workloads.

The challenge of AI thermal volatility

Modern AI factories are fundamentally different from traditional data centers: defined by massive scale, extreme density, and highly synchronized workloads. Operators of large-scale AI factory campuses, such as Applied Digital, must manage increasingly complex interactions between power infrastructure, liquid cooling systems and rapidly fluctuating GPU workloads for their partners. When massive AI training or inference jobs are dispatched, thousands of networked GPUs ramp up simultaneously, creating "peaky" power profiles that can jump from idle to maximum capacity within seconds.

Traditional liquid cooling relies on Proportional-Integral-Derivative (PID) controllers, which wait for a sensor to register a coolant temperature change before taking action. Because coolant has high thermal inertia, this reactive feedback loop suffers from a 3-to-5-minute delay, resulting in rapid heat spikes that force GPUs to throttle performance to protect themselves. To mitigate this, operators significantly over-cool their facilities to create a safety buffer—a strategy that wastes massive amounts of energy and limits overall compute capacity.

The AI-driven solution

To close this latency gap, Phaidra developed a self-learning reinforcement learning (RL) AI Agent that fundamentally changes how cooling is managed. Instead of reacting after-the-fact to temperature changes, the AI Agent uses real-time rack power data as a leading indicator to predict and prevent thermal spikes. The agent seamlessly sends optimal setpoint commands to the Coolant Distribution Unit (CDU) before the heat fully registers in the fluid, reducing the effective response delay from minutes to under 10 seconds in validated production environments.

Proven Results at Gigawatt Scale

The new methodology underwent rigorous joint A/B testing in live production environments, including an NVIDIA DGX SuperPOD cluster running LLM training workloads and CoreWeave's NVIDIA GB200 NVL72 environments. The results were transformative:

Massive reduction in thermal overshoot: The AI Agent successfully reduced the magnitude of thermal spike overshoots by 75% to 80% compared to optimally-tuned PID baselines during sudden load ramps.
Unprecedented scale: Following this successful validation, Phaidra and CoreWeave are scaling the deployment of these AI agents throughout CoreWeave’s liquid-cooled fleet, bringing AI-driven thermal management to its next generation of data center capacity.

The pathway to Max-Q AI factories

Phaidra has integrated NVIDIA DSX Max-Q to operate the entire AI factory as a single unit of compute, at scale. By deep integration of Information Technology (IT) and Operational Technology (OT), this collaboration bridges the divide between white space compute and facility operations.

With thermal stability secured by Phaidra's AI agents, facilities can safely raise their supply water temperatures, significantly reducing the burden on facility chillers. This provides the foundation for the next phase of the collaboration, where operators dynamically shift stranded power from the cooling system to revenue-generating IT compute. For a baseline 1GW AI factory, raising the coolant temperature safely could unlock billions in additional annual revenue.

"In a world where computational resources are limited by energy availability, every watt that isn’t being used for valuable token generation is a wasted watt," the joint white paper states.

By co-designing power, cooling, and workload management systems, Phaidra, CoreWeave, NVIDIA, and critical infrastructure partner Applied Digital, are setting a new standard for reliability, end-user SLAs, and peak operational efficiency in the age of AI.

AI agents for liquid-cooled AI factories

Read Phaidra's white paper to learn how to prevent GPU throttling and safely increase facility temperatures to drive more revenue-generating compute.

Featured Expert

Learn more about one of our subject matter experts interviewed for this post

Jim Gao

Co-Founder, Chief Executive Officer

Jim is a co-founder and the CEO of Phaidra. He sets the strategic direction and leads the company in operational excellence. Prior to Phaidra, Jim led the DeepMind Energy Team and pioneered Google's use of AI controls on their hyperscale data center cooling systems. Prior to DeepMind, he spent a decade working as a Technical Lead for Google’s Data Centers.