http://dresdenboy.blogspot.hk
https://patchwork.ozlabs.org/patch/524324/
tl;dr:
general
1. 4-wide decode, 4-way IEU, 2-way AG, 4-way FP
2. 32KB L1 DCache, 512KB L2 Cache
3. IEU/FPU 都唔似有 move elimination. 依然喺 1 cycle mov.
4. 未有 issue queue size
fpu
1. 16 FLOPs/clk: 2* 128b FADD + 2* 128b FMUL
2. 3-cycle FADD & FMUL (SSE/AVX, 唔包 x87)
3. 5-cycle FMA 要食兩個 port (FP0&3 or FP1&3).
4. One 128b FMAC per clock only.
5. 多咗組 IADD 仲快咗. 家陣喺 1 cycle
ieu
1. 四粒 ALU
2. 四個 Port 都有 1-cycle LEA (2&3-operand?...), Compare & Shift
3. 未知 branch unit(s?) 咩 port
睇落唔錯 祝佢好運
[
本帖最後由 Puff 於 2015-10-5 09:39 編輯 ]