標題: [硬件] Contemporary CPU Architectures Compared [打印本頁] 作者: qcmadness 時間: 2007-6-16 15:13 標題: Contemporary CPU Architectures Compared
Introduction
Here are some of the architectural highlights about the current and future Intel / AMD CPUs. The following CPU architectures will be compared:
1. AMD K8 / Hammer (released in 2003) - Hammer
2. Intel Core Architecture (released in 2006) - Core Arch.
3. AMD K8L (?) / K10 (?), the next generation architecture (to be released in 2007) - NGA
4. Intel Core Architecture update, the Penryn / Wolfdale family (to be released in 2007 / 2008) - Penryn
Last updated: 26th January, 2008
Special thanks to Pippero and Clue69Less for corrections.
The architectural highlights:
1. Processor manufacturing technology: Hammer: 130nm / 90nm / 65nm SOI, 9 metal layers Core Arch.: 65nm, 45nm in 2007 H2, 8 metal layers NGA: 65nm SOI, 45nm SOI in mid-2008, 11 metal layers Penryn: 45nm with high-K design in 2007 H2, unknown number of metal layers
2. Cache system Hammer:
L1 cache: 64KB data + 64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 128-bit (32GB/s at 2GHz), latency: 12 cycles (90nm version)
L3 cache: absent Core Arch.:
L1 cache: 32KB, 8-way, latency: 3 cycles
L2 cache: 2-4MB shared for 2 cores, 16-way, 256-bit (64GB/s at 2GHz), latency: 12-14 cycles
L3 cache: absent NGA:
L1 cache: 64KB data + 64KB instruction, 2-way, latency: 3 cycles
L2 cache: 512KB, 16-way, 256-bit (64GB/s at 2GHz), latency: unknown
L3 cache: 2MB shared, 32-way, unknown width and latency Penryn:
L1 cache: 32KB, 8-way, latency: 3 cycles (expected to be the same as Core Arch.)
L2 cache: 3-6MB shared for 2 cores, 24-way (?), 256-bit (96GB/s at 3GHz), latency: unknown
L3 cache: absent
Special feature: "Split Load Cache Enhancement"
3. x86 decoding ability Hammer:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 general instructions, 36 FP instructions and 24 Integer instructions Core Arch.:
x86 decoders: 3 simple + 1 complex (the complex decoder can decode 2 simple codes in a pass)
Out-of-order execution buffer: 96 instructions NGA:
x86 decoders: 3 complex
Out-of-order execution buffer: 72 general instructions, 36 FP instructions and 24 Integer instructions Penryn:
x86 decoders: 3 simple + 1 complex (the complex decoder can decode 2 simple codes in a pass)
Out-of-order execution buffer: 96 instructions
(expected to be the same as Core Arch.)
4. ALU, FPU and SSE units Hammer:
ALU units: 3
SSE units: 2 units, 64-bit
SSE versions supported: SSE, SSE2 (all Hammer versions), SSE3 (for Rev. E and later) Core Arch.:
ALU units: 3
SSE units: 3 units, 128-bit
SSE versions supported: SSE, SSE2, SSE3, SSSE3 (part of SSE4) NGA:
ALU units: 3
SSE units: 2 units, 128-bit
SSE versions supported: SSE, SSE2, SSE3, SSE4A (part of SSE4 with some Core Arch. specific codes removed) Penryn:
ALU units: 3
SSE units: 3 units, 128-bit
SSE versions supported: SSE, SSE2, SSE3, SSE4
6. Memory controller Hammer: 1x128-bit memory controller (1 operation per cycle) Core Arch.: absent NGA: 2x64-bit memory controller with NUMA (max 2 operations per cycle), can change back to 1x128-bit mode Penryn: absent
7. Power management Hammer: Cool\'n\'Quiet (min. x5 multiplier) Core Arch.: EIST (min. x6 multiplier), switch off transistor when not in use NGA: improved C\'n\'Q, two separate power planes for crossbar and cores, separate clocks for each core Penryn: EIST (?), switch off transistor when not in use, C6 state, separate clocks for each core (the core frequency may exceed the rated frequency)