This is a summary of my understanding of the AMD EPYC 7002 processors, and more specifically the AMD EPYC 7742 processor:
Chip names always puzzle me, so let us decipher this AMD naming first:
EPYC is the current name for AMD’s server processor line - essentially processors that go into local or cloud rack servers. Ryzen is the current name for their desktop and mobile processors.
7002 or 7XX2: 7 is the series number, 2 is the generation (because these use Zen2 microarchitecture). The generation 2 chips are also called the Rome generation. The 2 XX digits in the middle represent performance, higher the better. They can range 25-74, so 7742 is the most performant processor.
All these processors are designed for dual-socket use - that is, 2 of these processors in a server which is the common rack server scenario. The processors designed for single-socket use will have a P suffix: 7742P for example.
- The CPU is actually a MCM (Multi-Chip Module) of 9 different chips (called chiplets) on a board.
- 8 of the chiplets are each called CCD (Core Complex Die) and the chiplet in the center is called the IOD (I/O Die).
- CCD are made from a 7nm process and IOD is made from a 14nm process.
- Each CCD is composed of 2 CCX (Core CompleX).
- Each CCX has 4 Zen2 cores and 16MB of L3 cache shared between the 4 cores.
- Each Zen2 core has 32KB 8-way L1 I-cache, 32KB 8-way L1 D-cache and 512KB 8-way L2 cache.
- Each Zen2 core does hyper-threading, supporting SMT (Simultaneous Multi-Threading) as 2 cores.
- All inter-CCX communication, inside a CCD or in between CCD, all have to go through the IOD.
- IOD handles all the communication with memory, other devices and with the other CPU (in dual-socket configuration).
- CCDs talk to the IOD through the IF (Infinity Fabric) interconnect.
- The IOD also has DDR memory channels to talk to RAM and PCIe connections to talk to GPUs, CPUs and other devices.
- Up to 4TB of memory is supported per socket.
- Set Maximum Bus Frequency to 2667 MT/s for optimum memory access latency.
- The IOD of each CPU supports 8 memory channels. The 8 memory channels are named ABCDEFGH.
- Each memory channel supports up to 2 DIMMs. The 2 DIMMs are connected serially on the channel like this:
Channel G -- DIMM 1 -- DIMM 2. A CPU supports 16 DIMMs and so a dual-socket server supports up to 32 DIMMs.
- For optimum performance, AMD recommends that all channels should be populated and with equal capacity. So, on a dual-socket server it is recommended to have all 32 DIMMs filled up or 16 DIMMs filled up (with one DIMM per channel).
- When there is 1 DIMM per channel, the system is said to be in 1 DPC mode. This is the most performant configuration.
- When there are 2 DIMMs per channel, the system is said to be in 2 DPC mode. This configuration runs at lesser memory speed than 1 DPC.
- All other configurations (12 DIMMs for example) are suboptimal compared to 1 DPC or 2 DPC.
- These CPUs support PCIe 4.0. x16 (16 lanes) provide 32 GB/s of bandwidth in bidirectional use and 16 GB/s in one direction.
- The IOD of each CPU supports 8 of these x16 PCIe channels. That is 128 lanes in total, for a total of 256 GB/s of bidirectional and 128 GB/s of unidirectional bandwidth.
- 4 of the x16 channels are used to talk to the other socket in a dual-socket configuration.
- For optimum performance, pin workloads to the socket that is directly connected to the GPU or NIC the workload uses.
- NUMA is configured by setting the NPS (NUMA Nodes Per Socket) value in BIOS. Possible values are 0-4.
NPS=0: Entire 2-socket system is 1 NUMA node. 16-channel interleaving is used for memory.
NPS=1: Each socket is 1 NUMA node. 8-channel interleaving for memory.
NPS=2: Each socket is 2 NUMA nodes. 4-channel interleaving for memory.
NPS=4: Each socket is 4 NUMA nodes. 2-channel interleaving for memory.
- Anandtech review of AMD EPYC 7002 processors
- Tuning Guide for AMD EPYC 7002 Series Processors
- Memory Population Guidelines for AMD EPYC 7002 Series Processors
- Socket SP3 Platform NUMA Topology for AMD Family 17h Models 30h-3Fh