Non-Nvidia AI Stack at SC25: AMD MI355X and AmpereOne M Go Rack‑Ready

At SC25, the non Nvidia AI stack finally moved from theory to steel. MiTAC rolled out a production-ready, liquid-cooled AMD Instinct MI355X platform while Gigabyte highlighted Arm-based AmpereOne M servers, all wired up next to Arista’s Broadcom-powered fabric. For enterprises and Tier-2 clouds, the message was blunt: you can now order a full, high-end AI rack that does not rely on Nvidia silicon or hyperscaler-only platforms.

Table of Contents

Why SC25 Marked a Turning Point for the Non-Nvidia AI Stack

For most of the last two years, “alternative AI hardware” meant chasing stray accelerator cards or waiting for roadmaps to harden. SC25 was different. ODMs such as MiTAC and Gigabyte showed complete, liquid-cooled systems in realistic rack layouts, demonstrating that design, validation, and manufacturing for non-Nvidia platforms have caught up to demand (see MiTAC’s G4826Z5 coverage). To many buyers, the non Nvidia AI stack at SC25 finally looked production-grade rather than experimental, with liquid-cooled AMD Instinct MI355X servers, AmpereOne M control planes, and Arista/Broadcom Ethernet fabrics forming a coherent alternative path for large-scale AI deployments.

In 2023 and the early part of 2024, the constraint was simple: raw accelerator procurement. Many enterprises could not secure Nvidia parts at any price, and Tier-2 clouds found themselves pushed toward managed services on hyperscaler platforms rather than building their own fleets. The SC25 systems show a different phase of the cycle. MiTAC’s G4826Z5 arrives as an integrated 4U chassis with eight MI355X GPUs, dual EPYC CPUs, liquid cooling plumbing, and multi-rack cluster examples. Gigabyte’s AmpereOne M server provides a complementary CPU-first building block for cloud and AI control planes. The critical shift is from “card availability” to “rack-ready SKUs” that can be booked, delivered, and deployed. For readers who have been tracking AMD Instinct and ROCm primarily on paper, this shift matters: the non Nvidia AI stack is no longer constrained to lab-only proofs of concept, but to demonstrably rack-ready systems that can be ordered, racked, and supported within normal enterprise procurement cycles.

For enterprises and regional clouds, that matters more than any single benchmark. A complete, validated alternative stack creates a credible path to diversify away from Nvidia allocation cycles and hyperscaler lock-in. It also restores some predictability to CAPEX planning: clusters can be budgeted, ordered, and delivered on timelines defined by ODMs and AMD/Ampere roadmaps rather than by Nvidia’s next tranche of capacity.

Inside MiTAC’s Liquid-Cooled AMD Instinct MI355X AI Server

MiTAC’s G4826Z5 is a concrete expression of a GPU-class AI server built around AMD’s accelerator and platform ecosystem. The 4U chassis is split physically: the top half holds eight liquid-cooled AMD Instinct MI355X accelerators; the lower half contains dual AMD EPYC 9005-series “Turin” CPUs, storage, power, and management infrastructure (SC25 hands-on).

Each MI355X is a CDNA 4-generation part paired with 288 GB of HBM3E memory and up to 8 TB/s of memory bandwidth, targeting dense AI and HPC deployments (AMD product page). With eight accelerators per server, a single G4826Z5 node exposes more than 2.3 TB of HBM3E directly to AI workloads. At SC25, MiTAC showed these servers in a 256-GPU configuration (32 nodes) and discussed a 512-GPU layout, with in-row coolant distribution to handle the aggregate thermal load. From an AI workload perspective, this configuration directly targets large language model training, recommendation systems, and high-throughput vision pipelines that previously defaulted to Nvidia H100-class deployments, anchoring them instead on the AMD Instinct-centric, non Nvidia AI stack.

Interconnect topology in the G4826Z5 is designed for training-class scale. The MI355X implements high-bandwidth xGMI links for GPU–GPU communication, complemented by PCIe Gen5 connectivity to the host EPYC CPUs. MiTAC’s demo racks paired these nodes with high-radix Ethernet switches, enabling scale-out training and large-batch inference clusters tied together via RoCE.

On the CPU side, the dual EPYC 9005 configuration provides the familiar x86 environment that enterprise operators expect. The design leaves headroom for large DRAM footprints and fast NVMe storage, suitable for staging training data sets, model checkpoints, and high-throughput inference traffic.

How Liquid Cooling Enables Dense MI355X AI Deployments

The MI355X’s power envelope is the critical constraint. AMD rates the part at up to 1,400 W TDP, substantially higher than air-cooled siblings that already push conventional air-cooling toward its limits (MI355X brief). MiTAC’s design makes that explicit: the tubing serving the GPUs is noticeably larger than the lines feeding the CPUs, underscoring how much heat must be moved out of an eight-GPU chassis.

In practice, this means liquid cooling is not a “nice-to-have” but the enabling technology for MI355X-class density. MiTAC used direct-to-chip cold plates, distribution manifolds, and a rack-integrated coolant loop in its SC25 cluster. The company also showed how an in-row cooling distribution unit can serve multiple racks, turning liquid cooling into a shared datacenter utility rather than a one-off science project. For operators evaluating a non Nvidia AI stack, this makes liquid cooling part of the core design decision: power density and sustained performance on MI355X nodes are inseparable from facility-level cooling strategy, not an afterthought to bolt on once hardware has been chosen.

For brownfield sites with limited floor loading, constrained power delivery, and existing hot-aisle/cold-aisle layouts, retrofit questions become central. In greenfield builds, by contrast, MI355X-type hardware can set the design baseline: higher rack power budgets, explicit leak detection and maintenance workflows, and facility-level heat rejection hardware sized for sustained multi-hundred-kilowatt rows.

Software and Ecosystem Support Around AMD Instinct and ROCm

Hardware alone does not make a stack. AMD’s ROCm software platform now underpins the Instinct line, with PyTorch and TensorFlow builds targeting MI300-series GPUs and their siblings, and growing support from inference runtimes and compilers such as Triton and ONNX Runtime (see AMD’s ROCm documentation). SC25 demos leaned on this maturing ecosystem, running standard LLM and vision workloads across the MI355X cluster.

Porting from CUDA-centric environments remains a primary friction point. While high-level frameworks increasingly hide backend differences, teams that rely on hand-tuned CUDA kernels or Nvidia-specific libraries still face a migration tax. In practice, early adopters report a mix of direct framework portability for mainstream models and targeted kernel rewrites or vendor-assisted tuning for custom pipelines. Teams that plan early for ROCm support, CI pipelines targeting AMD Instinct, and observability across mixed GPU fleets will find the non Nvidia AI stack easier to operationalize than those that treat MI355X clusters as one-off exceptions to Nvidia-centric standards.

The significance of MiTAC’s platform is that it wraps this still-evolving software stack in a conventional ODM support model. Enterprises are no longer experimenting on development boards; they are dealing with a vendor that can ship a defined SKU, with firmware, ROCm versions, and baseboard management controllers integrated and validated as a system.

Gigabyte’s AmpereOne M Servers as CPU-First AI and Cloud Building Blocks

Where MiTAC’s G4826Z5 targets GPU-bound workloads, Gigabyte’s AmpereOne M platform highlights the other half of a heterogeneous non-Nvidia stack: dense, efficient Arm CPUs optimized for cloud-style compute. While detailed SC25 coverage of the specific R1A3-T40-AAV1 model is limited, Ampere has disclosed the broad contours of its AmpereOne M line: high core counts, per-core power gating, and a focus on predictable throughput per watt for scale-out workloads (see Ampere’s AmpereOne M product brief).

In Gigabyte’s implementation, a 1U or 2U chassis pairs many Arm cores with substantial memory bandwidth and high-speed network links, positioning the server as a front-end and orchestration tier for GPU clusters, as well as a standalone host for CPU-bound AI services (SC25 Gigabyte overview). AmpereOne M designs commonly expose large numbers of Armv8 cores per node, enabling fine-grained multi-tenant consolidation without the simultaneous multithreading commonly seen in x86 parts.

This class of server is well-aligned with three patterns that recur in AI deployments:

Stateless microservices and API front-ends terminating user traffic and brokering calls to accelerator nodes.
Data preparation, feature engineering, and control-plane logic that is CPU-intensive but not GPU-bound.
Inference for small and medium-sized models where batch sizes are modest and latency sensitivity dominates.

In non Nvidia AI stack designs, this often translates into a clear separation of concerns: AmpereOne M or similar Arm servers absorb the chatty, user-facing and data-wrangling workloads, while MI355X-class accelerators are reserved for the numerically intense parts of the pipeline.

The economics follow naturally. Arm-based fleets designed for high perf/W can reduce rack-level power draw for always-on services and multi-tenant hosting, a material advantage for regional cloud providers that cannot amortize hyperscaler-class power contracts. When a mature ODM such as Gigabyte stands behind an Ampere platform, perceived risk drops: enterprises know they can source replacement nodes, access conventional warranty and support models, and integrate the servers into existing management and monitoring stacks.

Networking and Fabric for the Non-Nvidia AI Stack: Arista and Broadcom

Compute without a credible fabric does not make a training cluster. SC25 underlined that the non-Nvidia ecosystem now has a coherent networking story as well. Arista brought a range of switches based on Broadcom’s Jericho and Tomahawk families, pitching them explicitly as an Ethernet-based alternative to Nvidia’s Spectrum-X line for AI and HPC (Arista SC25 showcase).

One anchor system, the Arista 7280R4K-32DE, delivers thirty-two 800G QSFP-DD ports plus additional 25G ports in a 1U form factor, backed by 32 GB of packet buffer memory and encryption across all 800G links. It is built around Broadcom’s Jericho3 silicon, tuned for “scale-across” roles such as campus-wide AI fabrics and cross-datacenter interconnects. A sibling, the 7280R4K-64QC-10PE, mixes OSFP 800G ports with dozens of 100G ports and uses a 16 GB buffer, targeting networks that bridge AI backbones with more conventional datacenter gear.

Arista also highlighted platforms based on Broadcom Tomahawk, such as 51.2 Tb/s switches intended as leaf and spine building blocks in AI clusters. While these do not replicate all the end-to-end congestion-management features that Nvidia offers with combined Spectrum switches and NICs, they fit into an open RoCE-over-Ethernet fabric that AMD, Ampere, and many hyperscalers already know how to deploy. This Ethernet-first approach aligns with how most operators already run their datacenters, lowering the barrier to adopting a non Nvidia AI stack compared with more vertically integrated, proprietary fabrics.

The result is a vendor-neutral fabric layer. Operators can design MI355X racks, AmpereOne M control planes, and Arista/Broadcom fabrics as modular components. They then rely on standard Ethernet, RoCE, and emerging congestion-control techniques to manage tail latency and throughput, rather than committing to a single-vendor networking stack with proprietary extensions.

From Discrete Chips to a Coherent Non-Nvidia AI Stack

The most important takeaway from SC25 is not any one box, but the way the pieces fit together. MiTAC is packaging AMD CPUs and GPUs into dense, liquid-cooled servers and showing rack-scale deployments. Gigabyte is positioning AmpereOne M as a CPU building block that slots naturally into the same racks and row-level power and cooling assumptions. Arista is providing the fabric glue across servers and racks with Broadcom-based switches tuned for AI-class traffic.

What emerges is a vertically coherent non Nvidia AI stack. ODMs are no longer pushing bare motherboards or reference cards; they are delivering rack designs with compute, network, power distribution, and cooling pre-validated. Management firmware, BMC integration, and monitoring hooks are exposed in ways that existing datacenter tools can consume. Multi-vendor interoperability is an explicit design target: AMD accelerators, Arm CPUs, and Ethernet fabrics are all expected to coexist. What distinguishes this moment from earlier attempts at diversification is that the non Nvidia AI stack now spans silicon, systems, and software in a way that feels cohesive: from AMD Instinct accelerators and ROCm to AmpereOne M control planes and standards-based Ethernet fabrics.

On the software side, ROCm, Arm-optimized Linux distributions, and cloud-native orchestration frameworks such as Kubernetes are gaining better support for these platforms. Container images and operators tuned for AMD Instinct clusters are becoming easier to find. Gaps remain compared with Nvidia’s CUDA-centric software universe, but the vector is clear: each product cycle reduces the friction for developers who want to target non-Nvidia accelerators without rewriting entire codebases.

Why Market Pressure Makes the Non-Nvidia AI Stack Actionable Now

The timing of this shift is not accidental. Nvidia accelerators remain supply-constrained in many regions, with lead times and allocation processes that favor hyperscalers and the very largest buyers. Prices across the stack—from accelerator cards to complete DGX-class systems—reflect this scarcity. For enterprises and Tier-2 clouds, the opportunity cost of waiting has become visible: delayed AI projects translate directly into missed product features, slower automation gains, and weaker competitive positioning.

At the same time, regulatory and sovereignty pressures are pushing many organizations toward on-premises or sovereign-cloud AI deployments. Governments and regulated industries increasingly want guarantees about data locality, control over underlying hardware, and the ability to operate even if cross-border supply chains are disrupted. AMD, Ampere, and the ODM ecosystem can serve some of this demand through more diversified manufacturing footprints and a mix of regional integration partners, though claims of sovereignty must always be tied to specific jurisdictions and compliance regimes. For many of these organizations, a non Nvidia AI stack based on AMD Instinct and Arm CPUs is as much about supply-chain and governance resilience as it is about raw performance per dollar.

For Tier-2 clouds and colocation providers, the calculus is straightforward. A MI355X-based rack, fronted by Arm or x86 control planes and wired with Arista/Broadcom switches, is now a differentiated product that can be sold to customers who either cannot secure Nvidia capacity or cannot justify its premium. SC25’s live racks lower the perceived risk: buyers can see full systems operating under realistic loads, not just concepts on a roadmap.

Deployment Patterns: Using AMD Instinct and AmpereOne in Real AI Clusters

The architectures on display at SC25 suggest a few pragmatic patterns for deployment. Training-heavy environments gravitate toward GPU-dense MI355X racks. Each 4U node offers eight accelerators; populating a rack with eight such nodes yields sixty-four GPUs and multiple terabytes of HBM3E, a natural building block for large-scale model training and high-throughput inference.

CPU-heavy workloads and control planes fall naturally onto AmpereOne M or conventional EPYC/Xeon racks. Here, the goal is to host microservices, schedulers, feature engineering jobs, and data-plane services that orchestrate traffic to and from the GPU islands. Heterogeneous clusters pair these roles: MI355X racks handle the numerically intense training runs, while Arm or x86 racks manage the REST APIs, batch queues, and data transformations feeding and consuming model outputs. Over time, as schedulers and MLOps platforms grow more hardware-aware, this heterogeneous non Nvidia AI stack can be exposed to users as a single logical pool, with placement policies steering latency-sensitive or cost-sensitive jobs to the most appropriate nodes.

Networking architectures follow familiar leaf/spine or three-tier patterns, with Arista’s Tomahawk-based switches at the leaf and Jericho-based systems for scale-across roles that stretch across rows or datacenters. RoCE is the default overlay for training clusters; classic TCP/IP suffices for most control-plane and user-facing traffic.

Integration with existing Nvidia fleets will be the reality for most enterprises rather than an all-or-nothing migration. Practical strategies include assigning certain model families or business units to the AMD Instinct pool, while latency-critical workloads continue to run on Nvidia systems. Over time, schedulers and workload managers can be taught to dispatch jobs based on hardware affinity, cost, and latency constraints, turning heterogeneous hardware into a portfolio rather than a patchwork.

Operationally, SRE teams will need updated playbooks. Liquid-cooled racks demand new practices for monitoring coolant flow, inlet and outlet temperatures, and leak detection. ROCm introduces a different cadence of driver and runtime updates than Nvidia’s CUDA stack. Capacity planning, too, changes when racks can host both 1,400 W accelerators and lightweight CPU nodes; power and cooling envelopes must be modeled at the row and room level, not just per rack.

Risks, Gaps, and Open Questions for the Non-Nvidia AI Stack

The SC25 hardware makes it clear that an alternative stack exists, but it is not without risk. On the software side, ROCm and associated toolchains still lag CUDA in ecosystem depth. Many third-party libraries, MLOps tools, and monitoring systems treat Nvidia as the default target. While this is changing, teams adopting AMD accelerators should budget time and engineering effort for validation and, in some cases, contribution to upstream projects.

Vendor and roadmap risk also deserve attention. AMD has signaled strong commitment to its Instinct line, and Ampere continues to iterate on its Arm server roadmap, but both depend on sustained demand and competitive positioning against x86 incumbents and Nvidia’s expanding CPU ambitions. ODM SKUs can be discontinued or revised faster than enterprise refresh cycles, raising questions about long-term sparing and multi-year support. Buyers will need clear assurances on lifecycle support, including firmware updates and replacement part availability.

Networking, while more open, is not trivial either. Ethernet-based fabrics must still solve for congestion and incast under AI training loads. Arista and Broadcom are investing in features to address this, but few operators have the same level of battle-tested playbooks for 800G RoCE fabrics that hyperscalers have developed internally. Expect an iterative learning curve as early adopters push multi-thousand-GPU clusters over these open fabrics.

Mid-Term Outlook: Where the Non-Nvidia AI Stack Is Heading

Looking through the next few product and datacenter planning cycles, the SC25 systems should be seen as first-wave, production-grade entries rather than end states. In the coming planning horizons, expect more ODMs to follow MiTAC’s lead with liquid-cooled MI355X and successor designs, including variants tuned for different rack depths, front-to-back airflow compatibility, and mixed accelerator configurations. As early pilot clusters complete their first rounds of training workloads, real perf/W and TCO data will start to circulate, sharpening the economic comparison with Nvidia-based stacks.

As these deployments mature and second-wave hardware ships, software will be the main swing factor. If ROCm, framework backends, and ecosystem tools continue their current trajectory, most mainstream LLM, recommendation, and vision workloads should be portable with modest engineering effort. That in turn will let procurement teams treat AMD Instinct and Nvidia accelerators more as interchangeable options for many classes of jobs, constrained more by price, availability, and power envelopes than by outright compatibility.

Beyond the first cycles of adoption, as comparative trials and case studies accumulate, enterprises are likely to move from opportunistic diversification (“take AMD where Nvidia is unavailable”) to deliberate portfolio strategies. A plausible outcome is that a meaningful minority share of new AI training capacity—especially outside the largest hyperscalers—lands on AMD-centric racks, with Ampere-class Arm servers and Ethernet fabrics filling in the control and networking layers. In that scenario, accelerators themselves become somewhat more commoditized, and differentiation shifts toward system-level design: liquid-cooling integration, rack-scale power delivery, and tightly tuned software stacks.

The forecast is not that Nvidia’s dominance disappears, but that the practical ceiling on non-Nvidia infrastructure rises sharply. SC25 showed that enterprises and Tier-2 clouds no longer have to choose between waiting in line for Nvidia or renting capacity from hyperscalers on hyperscaler terms. Over the mid term, provided AMD, Ampere, and Arista sustain their current investment pace and supply chains remain stable, the alternative stack is likely to entrench itself as a durable second pillar of the AI infrastructure market—large enough to matter, open enough to shape pricing and innovation, and mature enough to carry production workloads at scale.