Mastering the Heat: Cooling & Power Solutions for a 50kW Rack Density AI Data Center

by Sean Murphy on 2/27/24 11:42 AM

As artificial intelligence (AI) continues to reshape industries and drive innovation, the demand for high-performance computing in data centers has reached unprecedented levels. Managing the cooling and power requirements of a 50kW rack density AI data center presents a unique set of challenges. In this blog post, we will explore effective strategies and cutting-edge solutions to ensure optimal performance and efficiency in such a demanding environment. 

artificial-int

Precision Cooling Systems

The heart of any high-density data center is its cooling system. For a 50kW rack density AI data center, precision cooling is non-negotiable. Invest in advanced cooling solutions such as in-row or overhead cooling units that can precisely target and remove heat generated by high-density servers. These systems offer greater control and efficiency compared to traditional perimeter cooling methods.

Liquid Cooling Technologies

liquid-cooling-newsletterLiquid cooling has emerged as a game-changer for high-density computing environments. Immersive liquid cooling systems or direct-to-chip solutions can effectively dissipate heat generated by AI processors, allowing for higher power densities without compromising on reliability. Explore liquid cooling options to optimize temperature control in your data center.

High-Efficiency Power Distribution

To meet the power demands of a 50kW rack density, efficient power distribution is paramount. Implementing high-voltage power distribution systems and exploring alternative power architectures, such as busway systems, can enhance energy efficiency and reduce power losses. This not only ensures reliability but also contributes to sustainability efforts.

Redundancy and Resilience

A high-density AI data center demands a robust power and cooling infrastructure with built-in redundancy. Incorporate N+1 or 2N redundancy models for both cooling and power systems to mitigate the impact of potential failures. Redundancy not only enhances reliability but also allows for maintenance without disrupting critical operations.

Dynamic Thermal Management

Utilize intelligent thermal management systems that adapt to the dynamic workload of AI applications. These systems can adjust cooling resources in real-time, ensuring that the infrastructure is optimized for varying loads. Dynamic thermal management contributes to energy efficiency by only using the necessary resources when and where they are needed.

Energy-Efficient Hardware

Opt for energy-efficient server hardware designed for high-density environments. AI-optimized processors often come with advanced power management features that can significantly reduce energy consumption. Choosing hardware that aligns with your data center's efficiency goals is a key factor in managing power and cooling requirements effectively.

Monitoring and Analytics

Implement comprehensive monitoring and analytics tools to gain insights into the performance of your AI data center. Real-time data on temperature, power consumption, and system health can help identify potential issues before they escalate. Proactive monitoring allows for predictive maintenance and ensures optimal conditions for your high-density racks.

Successfully cooling and powering a 50kW rack density AI data center requires a holistic and forward-thinking approach. By investing in precision cooling, liquid cooling technologies, high-efficiency power distribution, redundancy, dynamic thermal management, energy-efficient hardware, and robust monitoring tools, you can create a resilient and high-performing infrastructure. Embrace the technological advancements available in the market to not only meet the challenges posed by high-density AI computing, but to excel in this dynamic and transformative era of data center management.

Author's Note:

Not a bad blog post, right? I was tasked with writing a blog post on how to power and cool high density racks for AI applications. So, I had Chat GPT write my blog post in 15 seconds, saving me a ton of time and allowing me to enjoy watching my kid’s athletic events this weekend. As end users embrace AI technology, it is imperative that we understand how to support the hardware and software that enables us to achieve these time saving technologies. Over the past 6 months, about 20% of my time has been spent discussing how to support customer 35kW to 75kW rack densities.

Additionally, another key to understand, is the balance of AI and the end-user’s ability to recognize limitations and areas for improvement. AI taps into the database of information that is the Internet. Powerful, but it does so (at least currently) in a fashion that makes it appear to be two years behind. For example, this blog post was written to reflect a 35kW rack density, and subsequently, ChatGPT noted 35kW. However, today, I’m regularly working with racks supporting AI that average 50kW, and have seen go up to 75kW… and know that applications can hit upwards of 300kW per rack. So, please note, anywhere in the blog where it says 50kW, human intervention made these necessary edits to AI's outdated "35kW".

Also, just for reference, a 75kW application requires 21 tons of cooling for one IT rack! So, these new high-density technologies require the equivalent of one traditional perimeter CRAC to cool one AI IT Rack. DVL is here to help provide engineering and manufacturing support to design your Cooling, PLC Switchgear, Busway Distribution, Rack Power Distribution, Cloud Monitoring, and other critical infrastructure to support your efficient AI Technology.

Read More

Topics: Data Center, Thermal Management, Data Center efficiency, beyond the product, artificial intelligence

Gems of Wisdom from Beyond the Product

by Jodi Holland on 1/30/24 2:26 PM

“Experience is simply the name we give our mistakes.” — Oscar Wilde

When you’ve worked in the deep trenches of critical infrastructure long enough, like quite a few of our longest tenured employee owners, you know that the most valuable lessons don’t come from a textbook or a policy manual. Rather, we learn the most, right in the field, from those real world mistakes and mishaps—or “experiences” as Oscar Wilde would say.

So, we asked our Associates about their most valuable gems of wisdom that they could pass on to colleagues in the industry. Here are some of the top answers we received. We're sharing them here, in hopes that we'll be able to save others from learning some unfortunate lessons the hard way.

SERVICES

  • Spare parts are an on-site technician’s best friend.
  • A single-point-of-failure is NOT an end user’s best friend.
  • A hungry mouse can be disastrous to critical wires.
  • The smallest of unchecked details can be the source of a project’s biggest (and most expensive) problem.
  • Arc-flash is dangerous and not anything you ever want to see.
  • Weekly exercises are the surest way to know your GenSet will work when needed.
  • The UPS battery system is like a loaf of bread…you only get so many slices (or discharges)…and the bigger, and more often, the discharge, the quicker you’ll need a new loaf.

Let us know what you think of these nuggets. Have you had the unfortunate opportunity to learn any of these lessons on your own? Or do you have an important one to add to the list? Email us at Marketing@DVLnet.com. If we get enough responses, we’ll post a Part 2.

Read More

Topics: Data Center, service, optimized performance, top trends

Achieving Excellence in Data Center Operations

by Robert Leake on 1/9/24 11:41 AM

Data centers are the beating hearts of modern businesses. They house critical infrastructure and sensitive data that is vital to all departments across an organization. In this fast-paced digital landscape, making sure your data center is always in top operational shape shouldn’t be just a goal, but an absolute necessity on any given day that someone will need to access pivotal data at the click of a mouse.

And, as you know quite well, running a data center pulls you in multiple directions at once. That’s why, to ensure you’re never offline, it’s important to always have a real-time pulse on the areas outlined below. 

data center operations infographic

Security: Building Fortresses for Data

Imagine a data center as a fortress with a hard outer shell and multiple layers within, each with their own security measures. Strict management of access ensures only those who require entry to each of these levels can actually get in. This goes beyond the front door and is a physical concern throughout the entire data center. To minimize security risks, it’s a must to manage the who, why, and where of every person entering your facility, as non-company staff must access the grounds for daily demands or periodic maintenance.

Preparation is Key

The COVID-19 pandemic brought many unexpected challenges for those leading data center operations at the time. Companies have long developed various types of disaster recovery plans accounting for a variety of scenarios. However, the pandemic tested those plans. And, when we found ourselves in a situation that hadn’t been experienced in 100 years, many failed the test. Fortunately, lessons learned strengthened disaster recovery going forward. Such lessons include the delicate nature of supply chain management, the importance of procuring inventory when available, and being able to execute “on a dime” during even the most chaotic of times. For these reasons, establishing thorough disaster recovery plans and being able to quickly adapt to unknowns have become indispensable.

Safety: A Cultural Requirement

Prioritizing the well-being of employees working under extreme conditions is crucial and should never be a question. That is why, for very good reasons, safety has become a cultural requirement for all businesses. Main concerns within data center environments include managing worksites where employees from multiple companies are working in tandem, ensuring the safety of workers that are working alone, taking precautions when working with high voltage power infrastructure, and having in place efficient response processes in case of emergencies. It’s not just enough to have these processes in place, but to ensure that no one is cutting corners, especially organizational leaders, as values are engrained from the very top. If you get everyone home safely at the end of the day, you’ve got yourself a strong culture and a safe data center.

Continuous Improvement

Even the top tier of organizations have room for improvement, whether being driven for the need to optimize efficiency or new ways to stay on budget. Repetitive tasks can be improved by identifying process enhancements and design strategies. Challenging the status quo can have significant results when driven by the employees who are closest to the challenges. Buy-in at all levels is needed for improvements and long-term success, as support from leadership helps to ensure this evolution occurs.

Nurturing Future Leaders

As the most experienced data center professionals continue to retire, there is a greater need for fresh faces. But to accomplish this, the industry needs to make sure students at all levels are being properly introduced to the concept of data centers, how they work, and why they must work for society to function. For example, younger generations are the largest consumers and creators of data. The broadband requirements are ever increasing, and the workhorse behind this data isn’t even a thought, as they may not recognize the connection between data centers and their iCloud folders, unless it is demonstrated to them. Furthermore, tomorrow’s professionals stand to benefit from learning more about our industry, as it opens for them a new door of career potential and even lucrative compensation.

Exposing younger generations to the industry, whether through professional forums and societies or internships, providing guidance on required skills, and mentoring them as they mature, are essential to properly pass the torch. These future leaders will shape the industry's evolution and will more immediately allow you to sleep soundly at night knowing the lights are being properly kept on, and equipment is up and running.

Finding the Right Fit

Attitude and aptitude are definite requirements for an employee to succeed in data center operations. When recruiting for the best possible fit, you’re going to ultimately need someone who can handle the stress of working in such an unpredictable environment. Being resilient during challenging times makes for outstanding professionals in any field. Additionally, communication skills are vital. Being able to identify and resolve problems is great, but being able to turn those problems into learning opportunities for an entire team, is invaluable, especially in the high-stress moments.

By making these items a priority, and by constantly reevaluating your organization’s needs, you are positioning your organization for great success. One data center operations team that has figured this out quite well, is the EdgeCore Data Centers’ team of operations leaders, led by Therese Kerfoot, SVP Operations. In December, Kerfoot and her team, Harrison Stoll (VP Operations), Matt Silvers (VP Operations Programs), and Sarah Kasper (Sr. Director, Environmental Health & Safety) joined us on the DVL Power Hour, “Data Center Excellence: Operations & Safety,” where the four shared their experiences in these areas and more. To learn about the extremely valuable insights they brought to the table, please check out the On-Demand webinar, or listen to the adapted podcast version available below and on iTunes and Spotify.

WATCH THE WEBINAR LISTEN TO THE PODCAST
Read More

Topics: Data Center, Safety, beyond the product, operations

Available On-Demand: DVL Power Hour Webinars

by Jodi Holland on 7/6/23 2:15 PM

Since we began our DVL Power Hour webinar series a few years ago, we've been able to bring you more than 40 live episodes. We’ve hosted many discussions about a variety of topics related to critical infrastructure and data centers. Thermal Management. Batteries. E-Rates. Green Data. Pandemics. Service. We’ve talked about all of this and more, as we welcomed guests from some of our partners, such as representatives from Vertiv, Generac, Critical Labs, Packet Power, and more, as well as some of our customers, and even scientists who have helped explain some of the latest technologies and trends.

If you haven’t had the chance to tune in for any of these webinars, or haven't in a while, we hope you’ll make your way over to our list of past webinars, as all our previously broadcasted webinar episodes can be accessed on-demand via our website. We invite you to browse topics and titles to find any that may interest you.

webinars

Some of our most popular episodes include:

  • "How to Choose the Right Cooling System"
  • "The Importance of Indoor Air Quality"
  • "Research & Development: Advanced Methods of Cooling Electronics"
  • "Power Distribution in Critical Facilities"
  • "Expanding the Monitoring Equation: Alert Management to Risk Mitigation"
  • "NFPA Standards & Generator UL Listings with Generac"

As far as new webinar episodes go, we are currently on a break for the summer, but check back soon for more information. We will continue to bring you new episodes on a monthly basis. In the meantime, if you'd prefer, all our webinars are ALSO available in a podcast format as well. Episodes have been edited down--you won't be able to see video or slides, but will still get to enjoy some interesting conversations and insights into the critical infrastructure world while on the go. We hope you'll tune in. And if you have any questions or comments, please reach us at Marketing@DVLnet.com.

 

Read More

Topics: Data Center, Data Center efficiency, mission-critical, webinar

The Science of Cooling

by Jodi Holland on 2/1/21 1:06 PM

You know electronic and industrial equipment produces unwanted heat, and these levels continue to rise to dangerous levels. This presents the problem of removing the heat
generated before damage can occur to sensitive parts of critical IT, Communications, and Networking gear. Some cases allow for a simple ventilation solution, but you need more than an oscillating dime store fan in the world of IT applications.

Most IT applications exist in an environment where the available ambient air is contaminated or too warm to be used for the safe dissipation of unwanted heat. You want to keep your equipment life expectancy high, and not adversely effect sensitive components causing equipment malfunctions, slowdowns or failures. To create the optimum environment for the application, an evaluation of the anticipated operating conditions and thermal requirements of the equipment (or system) must be completed.IT-Cooling-Technology-1

Many organizations are taking a more scientific approach to cooling. The goal is to understand the science and techniques of effective data center cooling management. This includes the ability to quantify the changes necessary, to identify the appropriate best
practice, and to implement the airflow management strategy in the computer room.
By approaching next-generation cooling solutions as a science, you can:

  • identify isolated airflow issues negatively affecting IT reliability,
  • increase cooling capacity to allow for installation of more IT equipment, and
  • learn how to defer capital expenditures on computer center cooling equipment.

So, when considering a variety of cooling technologies, what questions should you ask to get the information you need? Check out our IT Cooling Technology Guide to get started.

IT-Cooling-Technology

Check Out the Guide
Read More

Topics: Data Center, cooling, vertiv

Subscribe to Our Blog

Recent Posts

Posts by Tag

see all