logo-mark

Cookie Settings

We use cookies to operate this website, improve usability, personalize your experience and improve our marketing. Your privacy is important to us. Privacy Policy.

Setupbreaker

November 19, 2024

breaker

8 min read

November 19, 20248 min read

Setup for Success: Future-Proofing Data Centers

Illustration of data halls managed with Phaidra's AI for future-ready cooling systems

Share

If you're a data center operator today (or really any industrial controls operation), you're feeling the pressure. The landscape is changing fast, expectations are increasingly high. With the rise of AI compute loads and high-performance/high-density computing, expectations of performance will continue to grow at an incredible rate. While there is an arms race to secure new land and power for future data center builds, there is an incredible amount of currently existing infrastructure in need of an update to handle the demand in advance of new builds becoming operational. It's not just about keeping the lights on anymore—it's about making sure your powertrain, cooling systems, and infrastructure are ready to handle what’s coming.

Whether it’s preparing for AI integration, determining how to unlock stranded power, or ensuring effective communication between your tenants and your facility teams, future-proofing your data center means planning well beyond immediate demands. These challenges aren't something you can afford to ignore if you want to stay ahead.

In this article, you will learn:

  • Why assessing your powertrain is the first step to AI optimization.

  • Tips to move towards unlocking stranded power and avoid overloading your system.

  • The critical role of proactive communication between tenants and operators.

Push the Mundane to the Machine

We’re beginning to see artificial intelligence applications revolutionize data center operations, bringing new efficiencies and capabilities but also new challenges. AI compute loads, especially those used in advanced applications like Chat-GPT that have incredibly vast datasets and equally immense compute demand, push data centers to their limits. As these demands continue to grow, operators need to rethink how their infrastructure is designed to handle these high performance demands.

One of the clearest benefits of AI optimization is improved Power Usage Effectiveness (PUE). AI systems can monitor and adjust energy consumption in real-time, ensuring that cooling is distributed efficiently across the facility, reducing the energy intensity related to a more broad, conservative cooling approach that is typical in the industry. This dynamic control approach significantly reduces wasted energy, helping data centers maintain optimal performance while keeping costs down. Additionally, AI-driven monitoring is key to elongating the life of equipment. By continuously analyzing the performance of critical systems, AI can detect early warning signs and enable predictive maintenance, reducing downtime and extending that lifespan. And preparation is important.

Leadership needs to assess the entire facility powertrain and determine if it is engineered, designed, or capable of high compute densities. Being proactive is essential as compute loads, and energy demands, increase.

Managing IT loads efficiently is also critical as AI workloads surge. AI-powered systems can dynamically allocate resources, ensuring that power and cooling are directed where they’re needed most. This prevents overloading and maximizes the efficiency of the entire data center. AI compute loads, such as those used in GPT, will significantly affect operations, making predictive maintenance and resource allocation all the more crucial.

Hence, preparing for AI in facilities requires a thorough assessment of the powertrain. Assessing the powertrain from the utility grid all the way down to the rack PDUs is the first step in AI optimization. Operators need to ensure their systems can handle current workloads efficiently while also scaling to meet future demands.

Powertrain Assessment and Unlocking Stranded Power

As mentioned, one of the most critical aspects of future-proofing data centers is ensuring that the powertrain—from the utility grid to the rack PDUs—is designed to handle the growing demands of AI and high-performance compute. When AI workloads increase, operators need to evaluate whether their infrastructure can support these higher densities without straining capacity or risking downtime.

A common challenge is determining whether the current infrastructure can handle future demands. Let’s say, for example, that a data hall was designed for a 5-megawatt load. You need to gauge if the high-performance compute is going to push load capacity beyond threshold limits. So operators need to continuously monitor and compare their design load intent to their actual IT load, avoiding potential overloads that could compromise performance and safety.

Another key opportunity for data center optimization lies in stranded power, which typically resides in the battery plant of the Uninterruptible Power Supply (UPS) system. These UPS batteries often sit unused, serving as backup in case of emergencies or during monthly maintenance cycles. However, they represent untapped potential. By efficiently managing and intelligently utilizing this stranded power, operators can reduce unnecessary grid consumption and improve overall power usage efficiency. PUE records the total power consumed by the site so the energy stored in a battery is already counted, therefore distributing it at strategic moments can help reduce overall PUE.

Stranded power is in the batteries of the UPS system. Batteries are constantly trickle-charging, and most of the time, the energy just sits there. Finding ways to purposefully use this power would not only add a layer of efficiency but also support better energy management, especially during peak loads or unexpected surges in demand.

By addressing these key areas, data centers can unlock stranded power and ensure their powertrain is equipped for the demands of AI and high-performance computing.

Strategies for Unlocking Stranded Power

  • Assess the Entire Powertrain: Start by examining your powertrain from the grid to the rack PDUs. This holistic approach ensures that every component is functioning as intended and has the capacity to handle future AI-driven loads.

  • Increase Power Capacity: As high-compute workloads grow, data centers need to ensure they can scale. Operators should plan for increased power capacity by upgrading UPS systems, busways, and other power distribution components.

  • Battery Management: Optimizing UPS battery management is essential for unlocking stranded power. Intelligent systems can monitor battery health and usage, ensuring they are efficiently utilized and not simply sitting idle for long periods.

  • Planning for Future Demands: Future-proofing requires forward-thinking strategies. As AI compute loads continue to grow, it’s not enough to design for today’s needs. Operators must consider what the next 5 to 10 years will bring in terms of IT load growth and power demand. This means building in the flexibility to scale and ensuring that your power infrastructure can adapt.

Data Management

Effective data management is not just crucial, it can be a competitive advantage. As AI and high-performance computing increase demands on infrastructure, optimizing resource allocation, enhancing system performance, and ensuring quick disaster recovery allow operators to stay ahead of any challenges while maintaining efficiency and minimizing risks.

The only way to stay on top of all these challenges is to have an adequate robust and accurate operational telemetry dataset to analyze. One of the key aspects of managing data center operations is ensuring your cooling systems are keeping up with the actual, changing demands on the floor.

It’s not enough to assume an even distribution of IT loads across the data hall. Certain areas heat up faster as workloads ramp up in specific racks. So if a tenant loads the data hall unevenly—say, concentrating servers in one corner—then you’re dealing with heat buildup in that one spot. Meanwhile, the cooling is still distributed evenly across the hall, providing unnecessary cooling and not properly handling the heat spike in that corner.

To effectively manage cooling, you need systems that can adjust dynamically based on real-time load data, rather than cooling the entire hall uniformly. This targeted approach prevents overstraining the cooling infrastructure and helps avoid inefficiencies and potential failures caused by uneven heating.

Staying proactive with cooling adjustments, especially as high-performance compute and AI loads keep growing, ensures you’re meeting the facility’s cooling needs where they’re most critical.

Continuous monitoring allows operators to avoid a mismatch between the designed capacity and actual usage, which is critical for avoiding bottlenecks and unnecessary strain on resources.

Leveraging AI to optimize data flow and cooling systems is another powerful strategy for future-proofing data centers. Industrial AI systems can dynamically adjust cooling and resource usage based on real-time data, ensuring that performance is maximized without wasting energy.

This kind of data-driven approach helps maintain peak efficiency, especially as workloads fluctuate. AI’s role in resource allocation becomes even more essential as compute loads increase, helping operators ensure that both power and cooling systems are optimized for performance.

Additionally, a comprehensive data management strategy also includes disaster recovery planning. Integrating data systems can enable faster responses to unexpected failures and reduce downtime. This involves having backup systems in place, ensuring that data is properly replicated, and automating failover processes—a backup mode that switches to a redundant system when a primary system fails. The goal of failover is to minimize downtime and data loss and ensure business continuity from a secondary disaster recovery (DR) site.

With real-time monitoring and integrated recovery systems, data centers can respond quickly to disruptions and maintain continuity of service, even in the face of system-wide issues.

Incorporating these strategies optimizes daily operations and lays the groundwork for future scalability. Effective data management hinges on continuous monitoring and proactive adjustments, allowing operators to maintain the balance between IT loads and design capacity, improve system performance, and be prepared for unexpected disruptions.

gated-content-image

AI Readiness Checklist: Operational Data Collection & Storage Best Practices

Download our checklist to improve your facility’s data habits. Whether you are preparing for an AI solution or not, these will help increase the value of your data collection strategies.

Building Management Systems (BMS)

Building Management Systems (BMS) are at the core of running a data center. They’re what you rely on to keep track of all the key infrastructure—whether it’s power, cooling, or security. As AI workloads ramp up and high-performance computing pushes demands even higher, it's crucial to make sure your BMS integrates with other systems. But just as important is improving communication between tenants and operators to stay ahead of any changes or challenges coming your way.

One of the biggest advantages of an advanced BMS is automation that anticipates and plans accordingly, especially when it comes to managing power and cooling. The fewer manual inputs, the better. Reducing these inputs frees operators to focus on valuable improvements across the facility rather than routine, reactive tasks.

AI integration also offers dynamic response capabilities, adjusting power and cooling levels based on real-time data and load shifts. However, AI needs effective communication between tenants and colocation providers to function optimally. Even a slight heads-up on anticipated load changes allows the AI to prepare accordingly, minimizing human error and enabling smoother, more efficient responses to unexpected demands.

Cooling is a big deal, especially with AI workloads generating more heat than ever. That’s where Phaidra’s AI control service can really shine as a complement to a BMS. While a BMS provides basic monitoring, our AI Agent uses real-time data to adjust the cooling based on what's happening at the moment and what’s expected to happen in the near-term future, not just some pre-set rules. The key is to maximize efficiency by operating near the hot aisle limit temperature without crossing it. This approach conserves energy by fine-tuning cooling to precise levels, ensuring that systems stay within safe limits without unnecessary cooling. Keeping an eye on cooling becomes even more important as high-compute loads keep growing.

Future Proofing with Communication Systems

Now, it’s not just about technology—it’s also about communication. There’s a responsibility both ways for change management. The data hall tenant needs to be responsible for giving the facility management personnel a change management process as well. If the tenant and the facility team aren’t on the same page, things can go sideways fast. Setting up proper communication channels is critical so that any load changes are flagged early and both sides can act quickly.

Future-proofing is more than just having the best hardware or AI-driven systems. It needs improved communication at every level. Communication systems between tenant and operator need some signal that goes out in advance. Data center operations run most smoothly when tenants and facility teams are proactively communicating. Integrating automated, real-time notifications can help make that happen. These systems reduce the strain on operators, allowing for faster responses to load changes and avoiding potential crises.

AI systems can help with automating some of these notifications, but you can’t remove humans from the process. Someone has to initiate the system, and the tenant needs to provide accurate information to avoid surprises. It involves working together to ensure that load changes are managed smoothly with everyone involved in the process from the start.

Data Center Infrastructure Management (DCIM) and Integration

DCIM, or Data Center Infrastructure Management, is an essential tool in modern data centers. It’s how you keep a close eye on everything—tracking your powertrain, monitoring your IT load, and making sure all your systems are running smoothly. But DCIM does a lot more than just basic monitoring. It helps you manage energy, predict future demands, and even streamline maintenance when integrated with other systems.

One of the biggest advantages of DCIM is the ability to track everything in real time. Whether it’s the power your servers are pulling or how much cooling is being used, you can spot problems before they become big issues. You get a live view of your IT load and powertrain performance, which means you can make adjustments on the fly to prevent overloads and keep things running efficiently.

DCIM systems, especially when paired with AI, can help optimize energy usage. These systems can make automatic adjustments to keep your power usage in check, which reduces waste and cuts costs. And since Power Usage Effectiveness (PUE) is such a key metric, DCIM gives you the data to track it closely and make sure your energy management is dialed in.

Another big win with DCIM is its predictive capabilities. By looking at historical data, it can help you forecast future IT loads, power demands, and cooling needs. That’s crucial as AI-driven compute loads keep growing. The ability to predict and plan for future infrastructure needs is what’s going to keep your data center from running into problems down the line.

One of the most important things DCIM can do is** integrate with CMMS (Computerized Maintenance Management Systems)**. There should be some capabilities of integration between your building automation system and your computerized maintenance management system. When you connect DCIM with your maintenance systems, you get a ton of useful telemetry data. This lets you plan your maintenance activities more efficiently and stay ahead of potential issues, meaning fewer unexpected breakdowns and smoother operations overall.

Integrating CMMS and BMS doesn’t just generate more data—it creates enriched data that gives AI valuable context for decision-making. For instance, if a chiller is offline, an unconnected BMS might only register it as unavailable. But when BMS and CMMS systems are linked, an AI learns that it’s out for scheduled maintenance, not due to an operational issue - thus providing important context for the situation. This enriched data allows AI to interpret and respond to real-time conditions with greater accuracy, minimizing human intervention and supporting more efficient operations.

Right now, a lot of data centers still run with siloed systems. You’ve got one system for monitoring power, another for maintenance, and a whole bunch of manual inputs in between. That’s where things can get messy. What we really need is more integration—tying DCIM, BMS, and CMMS together so everything works in sync. That way, you get the full picture of what’s happening in your data center, and you can make smarter decisions based on real-time data.

In the end, integrating DCIM with other systems like BMS and CMMS is the way forward. Whether it’s predicting future loads or optimizing maintenance schedules, these tools help you stay ahead of the curve and keep your data center running efficiently. That’s what future-proofing is all about.

Liquid Cooling as a Future Strategy

As AI compute loads continue to grow and high-performance computing becomes more demanding, data centers will need to look beyond traditional airside cooling. While airside cooling has worked well up to now, it only gets you so far. Eventually, it won’t be enough to keep up with the heat generated by these heavy workloads. That’s when liquid cooling starts becoming an attractive option.

People are going to start considering how to start converting their data hall to liquid cooling because airside cooling only works great to a certain point. Liquid cooling can handle those high-performance computing loads better, but it’s not something you jump into lightly. There are a few things to keep in mind before you make the switch.

  • Cost and Labor: Converting a data hall to liquid cooling isn’t cheap. Installing liquid cooling systems takes time and expertise that not every facility is equipped for. You’ll need to make sure your team knows how to handle the new systems, so there’s a training aspect, too. You can’t just put liquid cooling in and expect it to run itself.

  • Structural Integrity: Next comes the question of whether your facility can handle the extra weight. Liquid cooling systems can add a significant load to your floors and racks, so you need to be sure your data hall is up to the task.

    It’s like with electric vehicles and bridges—you’ve got to ask, are the structures built to handle that extra weight? The same goes for data centers. You have to make sure the structural integrity is there before you start adding all that extra weight to the pipes and water.

    Moreover, there’s the risk of liquid cooling leaking that needs to be considered. Leaks can wreak havoc on an entire data center which can lead to equipment shorting out. This is detrimental to operations and adds cost to equipment needing to be replaced or repaired.

  • Long-Term Efficiency: Once it’s up and running, though, liquid cooling can be more efficient than traditional methods. Unlike airside cooling, where you’ve got filters to replace and regular maintenance to keep it going, liquid cooling systems can run with fewer interruptions. That said, it’s important to train your facility management staff on how to maintain these systems properly since they’re not used to dealing with water cooling technology.

Liquid cooling might become a more common strategy for data centers that need to handle massive AI workloads and high-performance computing. But like anything else, you have to weigh the benefits against the costs and risks.

Now what?

When it comes to future-proofing data centers, it’s not just about keeping the lights on—it’s about making sure your infrastructure can handle what’s coming down the road. With AI workloads increasing and high-performance computing pushing your systems harder than ever, you’ve got to be ready. That starts with being proactive.

AI isn’t just a buzzword—it’s a tool that can make your data center more efficient, from energy use to predictive maintenance. But it only works if your infrastructure is prepared. There’s untapped power sitting idle in your UPS systems that could be put to better use. By assessing your entire powertrain and managing UPS batteries smarter, you can scale up and avoid hitting capacity limits.

Moreover, running everything in silos just doesn’t cut it anymore. Integrating BMS, DCIM, and maintenance systems gives you a real-time, full view of your operations so you can make informed decisions when it matters most. And let’s not forget the importance of communication. Tenants and operators need to stay on the same page—especially with load changes. Setting up proactive systems helps you stay ahead of problems, instead of scrambling to fix them.

In the end, adaptability and forward-thinking are what will keep your data center ahead of the curve. The operators who assess, optimize, and integrate their systems now are the ones who’ll be ready for whatever’s next.

Featured Expert

Learn more about one of our subject matter experts interviewed for this post

author-avatar

Dan Bishop

Sr AI Control Solutions Engineer

Dan serves as a Senior AI Controls Solutions Engineer at Phaidra. He’s responsible for developing autonomous control AI solutions for data center industry customers. Prior to Phaidra, Dan’s expertise in the building automation and controls industry was honed on multiple mission critical facility projects including numerous US Embassies/Consulates for Johnson Controls and Albiero Energy. Dan also spent nearly a decade as an Automation and Monitoring Specialist for T5 Data Centers. He’s a subject matter expert in the datacenter industry regarding construction and facility management. Dan’s mission at Phaidra is to make AI optimization a reality for the end user

Share


Recent Posts

logo-morsecode
yperscaler's High Tech Sustainability Goals Vs. Operational Reality

AI | October 09, 2024

Data center hyperscalers and colocation providers face challenges meeting sustainability goals as the AI boom drives exponential computing demand. Working together could unlock more.

Artificial Intelligence plays a big role in the Fourth Industrial Revolution

AI | August 28, 2024

The Fourth Industrial Revolution advances industrial sustainability through AI and other technologies, increasing efficiency, reducing emissions, and conserving resources—contributing to a higher net positive change

AI industrial controls network diagram showing interconnected operations using industrial automation in mission-critical facilities.

Safety | July 24, 2024

Revolutionizing operations with AI Industrial Controls. Learn how industrial automation powered by artificial intelligence enabled adaptive control can optimize mission-critical facilities for unparalleled efficiency.

Phaidra Logo
linkedin

Subscribe to our blog

Stay connected with our insightful systems control and AI content.

You can unsubscribe at any time. For more details, review our Privacy Policy page.

© 2024 Phaidra, Inc. All Rights Reserved.
Alfred