The Nvidia IA supercomputer costs three million dollars. And to function wears a switch with three km cable

When Nvidia presented her new AI chips, The B200 with Blackwell architecturetook the opportunity to present an AI accelerator called GB200. And by joining 36 of those accelerators created its AI server, the monstrous DGX GB200 NVL72, which also keeps some spectacular surprises.

Each node is bestial. Each of those GB200 accelerators has a CPU Nvidia Grace with 72 ARM Neaven V2 nuclei and two B200 GPUS. By combining its power we end up having a kind of bestial GPU combined with a power of 1.44 Exaflops in precision FP4.

A closet that weighs a quintal. The appearance of the GB200 NVL72 DGX is that of a small and narrow closet that is above all very dense: this rack weighs 1.36 tons. Inside there are 18 Bianca computing nodes in 1u format, and each of them has two GB200, or what is the same, with four B200 GPUS (hence 18 x 4 = 72). He estimated cost of this AI server is about three million.

Liquid cooling is key. The heat dissipated by these components is remarkable, which makes in this case the best option to cool those elements is the liquid cooling. This system not only applies to the CPU Grace or in the B200 GPUS, but in the NVLink chips of the switches, which can also be heated a lot due to the massive transfer of data between the accelerators.

Nvidia2
Nvidia2

Interconnections everywhere. For all these GPUS to work together, each of the 36 GB200 has specialized network cards with NVLINK support of fifth generation that allow each of the computer nodes to be connected to others. For this there are nine switches that provide that huge amount of interconnections.

3 km cable. The system allows you to enjoy a bidirectional bandwidth of 1.8 TB/s between the 72 server GPUS. But as they point out In The Registerthe really surprising thing is that in total inside that “closet” there are 3.2 kilometers of copper cable. Only the module with the switches weighs more than 30 kilograms due to both these components and the more than 5,000 cables that are used so that all Nvidia GPUS work together and in perfect synchrony.

Why copper? It may be able to opt for copper cable seems strange, especially taking into account the needs in terms of bandwidth imposed by this machine. However, the solution with fiber optic cables imposed clear problems: we would have to use electronic components necessary to stabilize and convert optical signals. That would have increased not only the cost, but the consumption of the final system.

Can Crysis run? The performance of each B200 chip It is already brutal on its own: Its power is the triple than that of the GeForce RTX 5090, and the entire server includes 72 of these specialized GPUSs for AI, which demonstrates the computing capacity that said machine possesses. It also has RT (Ray-Training) nuclei of the fourth generation, which would theoretically allow you to use these AI chips to play video games, although of course that is not even its purpose. In fact your performance in this area will probably be almost as poor as the Nvidia H100.

Cloud consumption. Although new chips are much more efficient than H100 –25 times less, says Nvidia – this AI server has an estimated TDP of 140 kW. Since the average consumption of an average home in Spain round The 3,000 kWh per year, in an hour of use of the Nvidia server we consume the same as an average Spanish home in 17 days. Have it on and running all year raises a consumption similar to 415 middle homes throughout the year in Spain.

In Xataka | AMD has a splendid roadmap for its AI chips. The problem is still in your software

Leave your vote

Leave a Comment

GIPHY App Key not set. Please check settings

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Here you'll find all collections you've created before.