What is a GPU?

A GPU (Graphics Processing Unit) was originally designed for graphics rendering, but it has become the core compute engine for AI training and inference. Thousands of small parallel cores execute large-scale linear algebra — especially matrix multiplication — tens to hundreds of times faster than CPUs.

NVIDIA's H100/A100, AMD's MI300, and Google's TPU are the representative AI accelerators. Training a modern frontier LLM requires thousands to tens of thousands of GPUs running for weeks or months.

How Does a GPU Accelerate AI?

More than 95% of AI workloads are matrix multiplication (GEMM). GPUs ship dedicated Tensor Core units tuned for this operation, delivering far higher throughput per watt and per square millimeter than general-purpose CPUs.

Training: hundreds of GBs to multiple TBs of model parameters live in GPU HBM memory while backpropagation runs in parallel
Inference: batches of user requests are fused so that one GPU serves many concurrently (batching, throughput optimization)
Distributed training: ultra-fast interconnects like NVLink and InfiniBand turn hundreds to thousands of GPUs into a single giant cluster

A single H100 reaches about 1 PFLOPS FP16, with 80 GB of HBM3 at 3 TB/s bandwidth. The next generation — B200 and MI350 — roughly 2–3× those numbers.

Why Does It Matter?

GPU supply is both the bottleneck and the strategic asset of the AI industry. Training a large LLM presupposes thousands of GPUs and tens of millions of dollars; without GPU availability, power, and cooling, frontier model development is simply not possible. That supply chokehold is a major reason NVIDIA rose to the top of global market capitalization between 2023 and 2025. On the inference side, latency, cost, and energy efficiency depend directly on GPU generation — making GPUs a critical variable in product pricing, SLAs, and margin.

What is a GPU?

How Does a GPU Accelerate AI?

Why Does It Matter?

Related terms