Board logo

標題: [硬件] Zen core [打印本頁]

作者: Puff    時間: 2015-10-3 15:30     標題: Zen core

http://dresdenboy.blogspot.hk
https://patchwork.ozlabs.org/patch/524324/

tl;dr:

general
1. 4-wide decode, 4-way IEU, 2-way AG, 4-way FP
2. 32KB L1 DCache, 512KB L2 Cache
3. IEU/FPU 都唔似有 move elimination. 依然喺 1 cycle mov.
4. 未有 issue queue size

fpu
1. 16 FLOPs/clk: 2* 128b FADD + 2* 128b FMUL
2. 3-cycle FADD & FMUL (SSE/AVX, 唔包 x87)
3. 5-cycle FMA 要食兩個 port (FP0&3 or FP1&3).
4. One 128b FMAC per clock only.
5. 多咗組 IADD 仲快咗. 家陣喺 1 cycle

ieu
1. 四粒 ALU
2. 四個 Port 都有 1-cycle LEA (2&3-operand?...), Compare & Shift
3. 未知 branch unit(s?) 咩 port

睇落唔錯 祝佢好運


[ 本帖最後由 Puff 於 2015-10-5 09:39 編輯 ]
作者: dom    時間: 2015-10-3 17:11

QC come look
作者: qcmadness    時間: 2015-10-3 17:12

Cache and latency?
作者: Puff    時間: 2015-10-3 18:04

引用:
原帖由 qcmadness 於 2015-10-3 17:12 發表
Cache and latency?
4-cycle L1 load-use. SSE/AVX seems take no extra cycle. no evidence on L2 latency.

[ 本帖最後由 Puff 於 2015-10-3 18:07 編輯 ]
作者: qcmadness    時間: 2015-10-3 18:38

引用:
原帖由 Puff 於 2015-10-3 18:04 發表

4-cycle L1 load-use. SSE/AVX seems take no extra cycle. no evidence on L2 latency.
L2 and L3 cache latencies are crucial.
作者: Puff    時間: 2015-10-5 09:38

Per the patch, Zen core can issue only one 128-bit FMAC per clock only, since FMAC was modelled as `(fp0+fp3)|(fp1+fp3)`. It could be a typo though, since the latency of 256-bit FMAC is not modelled as +1 of its 128b variant.

[ 本帖最後由 Puff 於 2015-10-5 09:51 編輯 ]
作者: qcmadness    時間: 2015-10-5 19:54

引用:
原帖由 Puff 於 2015-10-5 09:38 發表
Per the patch, Zen core can issue only one 128-bit FMAC per clock only, since FMAC was modelled as `(fp0+fp3)|(fp1+fp3)`. It could be a typo though, since the latency of 256-bit FMAC is not modelled a ...
256-bit FMAC is not so relevant.

INT performance, cache penalties are more relevant.
作者: qcmadness    時間: 2015-10-6 20:10

就咁睇唔太夠AGU




歡迎光臨 HKSpot (https://bbs.hk-spot.com/) Powered by Discuz! 6.0 Lite