There will be a before and after Deepseek. We already know why this is so efficient

The publication of the V3 model of the artificial intelligence (AI) Deepseek as an open source is a blessing. And it is because little by little we are knowing in detail the strategy of the engineers of this Chinese company to Take a model of so efficient. Before moving forward with this article, it is important that we keep in mind that Depseek says that has trained his model using only 2,048 chips H800 of Nvidia.

Some analysts defend that, in reality, its infrastructure brings together 50,000 GPU H100 Buy through intermediaries, but for the moment it is just a conjecture. This chip is more powerful than the H800, but it is perfectly credible that Depseek has been forced to settle for the latter because The sanctions of the US government They have prevented Chinese companies from access to the H100 GPU. In fact, since November 2023 Nvidia cannot deliver To your Chinese clients your H800 chip.

One of Depseek’s keys is called PTX

In the recipe for the thrilling growth that Nvidia has experienced during the last five years, its GPU does not intervene; CUDA technology (Compute Unified Device Architecture) also has An essential role in your business. Most of the AI ​​projects that are currently being developed are implemented on CUDA. This technology brings together the compiler and development tools used by programmers to develop their software for NVIDIA GPUs, and replace it with another option in the projects that are already underway it is a problem.

Huawei, who aspires to an important portion From this market in China, it has Cann (Compute Architecture for Neural Networks), which is its alternative to CUDA, but for the moment CUDA dominates the market. In addition, this Nvidia tool puts in the hands of programmers High level language that allows them to access the GPU hardware in an affordable way. Even so, and we reach the heart of this article, Deepseek engineers have not used Cuda to develop their AI: They have used PTX (PARALLEL THREAD EXECUTION).

Deepseek engineers have decided to use PTX to get the most out of the H800 GPUs as possible

This language is similar to the assembly. In fact, it is somehow the assembly that proposes the developers who use their GPUs and need to implement low level optimizations in their code. Programming with PTX is more difficult and laborious than doing it with CUDA, but it entails the advantage that it allows developers to write a more efficient code, and, therefore, capable of taking better advantage of the resources offered by the Hardware of the GPU.

Presumably the Deepseek engineers have decided to use PTX to get the most out of the H800 GPUs they had in their possession. One of the stratagems they have devised has consisted of assigning only 20 SM (Multiprocessors streaming) From each GPU to the communication between the servers, which has allowed them Dedicate the remaining 112 From each chip to calculation processes. In essence, Deepseek has been built since zero by resorting to this type of optimizations, which largely explains why this AI model is so efficient.

The programmers of this Chinese company have objectively materialized an achievement in the field of engineering that will in all likelihood will have a deep impact on the way in which AI models developers will face their projects in the future. This is the palpable evidence that China is successfully adapting to the shortage of GPUs that have triggered US sanctions in their companies.

Image | Nvidia

More information | Mirae Asset Securities Korea

In Xataka | We can forget an AI without hallucinations for now. The general director of Nvidia explains why

Leave a Comment