最新硅谷大厂快报：Meta 的卡塔利纳 Pod AI 系统与英伟达的GB200

抖音推荐 2025年08月25日 19:19 1 admin

Meta has shared the building blocks of its Catalina AI system, which is based on NVIDIA's GB200 NVL72 solution with Open Rack v3 & Liquid Cooling.
Meta 分享了其卡塔利纳人工智能系统的构建模块，该系统基于 NVIDIA 的 GB200 NVL72 解决方案，采用开放式机架 v3 和液体冷却。

Meta's Custom NVIDIA GB200 NVL72 Blackwell Platform, The Catalina Pod, is Liquid Cooling-Ready & Open Rack v3 CompliantMeta 的定制 NVIDIA GB200 NVL72 Blackwell 平台，即卡塔利纳 Pod，符合液体冷却就绪和开放式机架 v3 标准

Back in 2022, Meta mainly focused on clusters that were around 6,000 GPUs in terms of size. These were mainly designed for traditional ranking and recommendation models, so essentially running workloads that spanned 128-512 GPUs.
早在 2022 年，Meta 主要关注规模在 6,000 GPU 左右的集群。这些主要是为传统的排名和推荐模型设计的，因此基本上运行跨越 128-512 个 GPU 的工作负载。

Related StoryNVIDIA Dives Into Technical Details of Its Blackwell GB200 & GB300 NVL Racks, Trays & MGX’s Open Compute CommitmentsNVIDIA 深入了解其 Blackwell GB 200 GB 300 NVL 机架的技术细节，支持 MGX 的开放计算承诺

A year later, thanks to the advent of GenAI & LLMs, clusters grew to 16-24K GPUs (a 4x increase), and just last year, Meta was running 100,000 GPUs and continues to add more. Meta is also a software enabler with models such as LLama, and anticipates a 10x increase in cluster sizes by the next few years.
一年后，由于 GenAI 和 LLM 的出现，集群增长到 16- 24 K GPU（增长了 4 倍），就在去年，Meta 运行了 10 万个 GPU，并继续增加。Meta 也是一个软件推动者，拥有 LLama 等模型，预计未来几年集群规模将增长 10 倍。

Meta states that they started on the Catalina project very early with NVIDIA, and utilize their NVL72 GPU solution as the baseline. Meta also worked with NVIDIA to customize the system to meet their needs, and both also contributed the reference design for MGX and NVL72 to open source, with Catalina being online on the Open Compute website.
Meta 表示，他们很早就开始与 NVIDIA 合作卡塔利纳项目，并利用他们的 NVL72 GPU 解决方案作为基线。Meta 还与 NVIDIA 合作定制系统以满足他们的需求，双方还将 MGX 和 NVL72 的参考设计贡献给开源，卡塔利纳在 Open Compute 网站上在线。

So jumping into Meta's Catalina, this is what is being deployed by them in their data centers. Meta calls each system a pod, and they essentially copy/paste it for scale-up reasons.
我们来看一下 Meta 的卡塔利纳，这就是他们在数据中心部署的东西。Meta 将每个系统称为 pod，出于扩展的原因，他们基本上复制/粘贴它。

One difference between the standard NVL72 versus Meta's custom version is that they have two IT racks that consist of a single 72 GPU scale-up domain. Each of these IT racks has the same configuration. They have 18 compute trays split between the top and the bottom of the rack. And they have nine NV switches within each IT rack on the left and the right. Between each system is a big, thick bundle of cables.
标准 NVL 72 与 Meta 自定义版本之间的一个区别是，它们有两个 IT 机架，由一个 72 GPU 的纵向扩展域组成。这些 IT 机架中的每一个都具有相同的配置。他们有 18 个计算机托盘之间的顶部和底部的机架分裂。他们在每个 IT 机架的左边和右边都有九个 NV 交换机。每个系统之间都有一束又大又粗的电缆。

This is something that basically allows all of these GPUs across the two racks to be combined, connecting through the NV switches to create a single 72-GPU scale-up domain. On the left and right of the racks, you can see large ALCs, or air-assisted liquid cooling devices. These allow Meta to deploy liquid-cooled, high-power density racks into their existing data centers that are being deployed all over the US and the world.
这基本上允许跨两个机架的所有这些 GPU 组合，通过 NV 交换机连接以创建单个 72 GPU 扩展域。在机架的左侧和右侧，您可以看到大型 ALC 或空气辅助液体冷却设备。这使得 Meta 能够将液冷、高功率密度机架部署到其正在美国和世界各地部署的现有数据中心中。

Meta states that with two racks, they can essentially increase the number of CPUs and the amount of total memory within a rack, so going from 17 to 34 TB LPDDR memory, which helps them get all the way up to 48 TB of total cache-coherent memory that's between both the GPUs and the CPUs within a rack.

The PSU takes 480 volts or 277 volts single-phase and converts it to 48 volts DC, which is distributed through the buck bar in the back, and that's what powers all of the individual server blades, NV switches, and networking devices within the rack.
Meta 指出，使用两个机架，他们可以从根本上增加 CPU 数量和机架内的总内存量，因此从 17 TB LPDDR 内存增加到 34 TB，这有助于他们在机架内的 GPU 和 CPU 之间获得高达 48 TB 的总缓存一致性内存。PSU 采用 480 伏或 277 伏单相电压，并将其转换为 48 伏直流电，通过背面的降压条分配，这就是机架内所有单个刀片式服务器、NV 交换机和网络设备的电源。

So at the top and bottom of the rack, you can see there's one power supply shelf, and then two more at the bottom of each. Meta also has its own fiber path panel, which is what all of the in-rack fiber cabling is connected to for the back-end network, which then goes out to the data center to essentially connect to the networking switches that sit at the end of the row for the scale-up domain.

There's the rack management controller, Wedge 400, which is a front-end network switch, and then there are several IT and switch trades.
在机架的顶部和底部，您可以看到有一个电源架，每个电源架的底部还有两个电源架。Meta 也有自己的光纤路径面板，所有机架内光纤布线都连接到后端网络，然后连接到数据中心，基本上连接到位于纵向扩展域行末尾的网络交换机。有机架管理控制器，楔形 400，这是一个前端网络交换机，然后有几个 IT 和交换机交易。

To support all of this, Meta requires a range of new technologies, some of which are already a part of the NVIDIA NVL72 GB200 Blackwell system. Unique to Meta, there were a few things they have, like the high-power version of their open racks, essentially higher power supplies and CPUs.

They also had liquid cooling, so the air-assisted liquid cooling needed to support those racks and traditional data centers.

The rack management controller, which is basically a safety and orchestration device that helps enable and disable cooling, also monitors for leaks in the racks. They have their network topology, the disaggregated scheduled fabric, which is what allows them to connect multiple of these pods to make larger clusters.
为了支持所有这些，Meta 需要一系列新技术，其中一些已经是 NVIDIA NVL72 GB200 Blackwell 系统的一部分。

Meta，有一些东西，像高功率版本的开放机架，本质上更高的电源和 CPU。它们还具有液体冷却，因此需要空气辅助液体冷却来支持这些机架和传统数据中心。机架管理控制器基本上是一个安全和协调设备，可以帮助启用和禁用冷却，还可以监控机架中的泄漏。它们有自己的网络拓扑，即分散的调度结构，这使得它们能够连接多个 Pod 以形成更大的集群。

This is also the first deployment of Meta's high-powered rack version of OpenRack v3. This allows Meta to increase the amount of power for each rack up to 94 kW for the busbar (600A). This also supports newer buildings that have facility liquid cooling that actually lets you just run liquid straight to the rack.

To manage liquid, Meta is using something called the RMC, or the Rack Management Controller. It sits within the rack, and it basically is constantly monitoring a number of different components within the rack for leaks.

It's safely at the top of the rack here, essentially to make sure that if there is a leak, the leak doesn't happen to drip on it and shut it off. But it's what connects to the ALCs, helps them shut off, or connects to the valve train at the facility level, which basically shuts the valves off from the liquid coming in from the buildings that are at issue.

这也是 Meta 的高性能机架版 OpenRack v3 的首次部署。这使得 Meta 可以将每个机架的功率增加到 94 kW，用于母线（600 A）。这也支持较新的建筑，有设施液体冷却，实际上让你只是运行液体直接到机架。为了管理液体，Meta 使用了一种称为 RMC 或机架管理控制器的东西。它位于机架内，基本上是不断监测机架内的一些不同组件的泄漏。

它安全地放在架子的顶部，基本上是为了确保如果有泄漏，泄漏不会碰巧滴在上面，并关闭它。但它连接到 ALC，帮助他们关闭，或连接到阀门机构在设施一级，这基本上关闭阀门从液体进入的建筑物是有问题的。

Meta is also using their own disaggregated scheduled fabric for Catalina. This allows them to connect multiple pods together within a single data center building or suite, and lets them connect multiple buildings together. And maybe even like go larger than that to basically provide these really large-scale clusters. It's tuned for AI and helps provide flexibility and speed. This is essentially how all the GPUs talk to each other.
Meta 也在为卡塔利纳使用他们自己的分解计划结构。这使他们能够在单个数据中心建筑或套件内将多个 Pod 连接在一起，并将多个建筑连接在一起。甚至可能更大，以提供这些真正大规模的集群。它针对人工智能进行了调整，有助于提供灵活性和速度。这就是所有 GPU 相互通信的方式。