打印

[硬件] Zen core

Zen core

http://dresdenboy.blogspot.hk
https://patchwork.ozlabs.org/patch/524324/

tl;dr:

general
1. 4-wide decode, 4-way IEU, 2-way AG, 4-way FP
2. 32KB L1 DCache, 512KB L2 Cache
3. IEU/FPU 都唔似有 move elimination. 依然喺 1 cycle mov.
4. 未有 issue queue size

fpu
1. 16 FLOPs/clk: 2* 128b FADD + 2* 128b FMUL
2. 3-cycle FADD & FMUL (SSE/AVX, 唔包 x87)
3. 5-cycle FMA 要食兩個 port (FP0&3 or FP1&3).
4. One 128b FMAC per clock only.
5. 多咗組 IADD 仲快咗. 家陣喺 1 cycle

ieu
1. 四粒 ALU
2. 四個 Port 都有 1-cycle LEA (2&3-operand?...), Compare & Shift
3. 未知 branch unit(s?) 咩 port

睇落唔錯 祝佢好運


[ 本帖最後由 Puff 於 2015-10-5 09:39 編輯 ]

TOP

QC come look
天然系長髮眼鏡娘 最高
Lucky Star 聯盟 - 美幸
Kancolle - 大淀, 翔鶴 (太太), 烏海 , 瑞鶴

TOP

Cache and latency?

TOP

引用:
原帖由 qcmadness 於 2015-10-3 17:12 發表
Cache and latency?
4-cycle L1 load-use. SSE/AVX seems take no extra cycle. no evidence on L2 latency.

[ 本帖最後由 Puff 於 2015-10-3 18:07 編輯 ]

TOP

引用:
原帖由 Puff 於 2015-10-3 18:04 發表

4-cycle L1 load-use. SSE/AVX seems take no extra cycle. no evidence on L2 latency.
L2 and L3 cache latencies are crucial.

TOP

Per the patch, Zen core can issue only one 128-bit FMAC per clock only, since FMAC was modelled as `(fp0+fp3)|(fp1+fp3)`. It could be a typo though, since the latency of 256-bit FMAC is not modelled as +1 of its 128b variant.

[ 本帖最後由 Puff 於 2015-10-5 09:51 編輯 ]

TOP

引用:
原帖由 Puff 於 2015-10-5 09:38 發表
Per the patch, Zen core can issue only one 128-bit FMAC per clock only, since FMAC was modelled as `(fp0+fp3)|(fp1+fp3)`. It could be a typo though, since the latency of 256-bit FMAC is not modelled a ...
256-bit FMAC is not so relevant.

INT performance, cache penalties are more relevant.

TOP

就咁睇唔太夠AGU

TOP