原帖由 cheungmanhoi 於 2015-4-23 08:45 發表
http://www.tweaktown.com/news/44 ... rebrands/index.html
如果真係RENAME多
到底2012年5月到今日做過D咩?
咁多年只係出最TOP果張卡?
成TEAM ENGINE 炒左呀? ...
原帖由 dom 於 2015-4-29 15:47 發表
AMD ZEN News
http://www.techpowerup.com/21213 ... o-be-quad-core.html
45627
Highlight
- 落翻 8MB L3 ?? (上一代有既係 "Star" based (Phenom II 肥龍2)
- Front-end 資源多左?
- 真核心設 ...
原帖由 dom 於 2015-4-29 15:47 發表
AMD ZEN News
http://www.techpowerup.com/21213 ... o-be-quad-core.html
45627
Highlight
- 落翻 8MB L3 ?? (上一代有既係 "Star" based (Phenom II 肥龍2)
- Front-end 資源多左?
- 真核心設 ...
原帖由 dom 於 2015-4-29 15:47 發表
AMD ZEN News
http://www.techpowerup.com/21213 ... o-be-quad-core.html
45627
Highlight
- 落翻 8MB L3 ?? (上一代有既係 "Star" based (Phenom II 肥龍2)
- Front-end 資源多左?
- 真核心設 ...
原帖由 Puff 於 2015-4-29 23:14 發表
4+1? uop fusion? bulldozer 有呀. 32B/clk fetch 添
有冇 uop cache 就唔知,但 SR 有 uop loop buffer
原帖由 Puff 於 2015-4-29 23:24 發表
我錯 AMD 個 branch fusion 喺陽春過 Intel
但 branch fusion 都喺 take 5 instruction and decode into 4 complex ops
However AMD decided to introduce this kind of fusion in Bulldozer later in the decoding pipeline than Intel, where x86 branch fusion is already present in the predecoding phases. The result is that the decoding bandwidth of all Intel CPUs since Nehalem has been up to five (!) x86-64 instructions, while x86 branch fusion does not increase the maximum decode rate of a Bulldozer module.
原帖由 qcmadness 於 2015-4-29 23:26 發表
wrong
http://www.anandtech.com/show/50 ... lving-even-deeper/2
原帖由 dom 於 2015-4-30 02:41 發表
A 仔個 L3 同 Inter-Core Bandwidth 唔知有無改善...... Phenom II 都係呢樣唔夠, 去到 Faildozer 重衰d
加翻 L3 你比個慢既 又係拖死
原帖由 Puff 於 2015-4-30 02:41 發表
你拼埋 APPL Cyclone 成個餅印添啦
Sandy Bridge 呀 Ivy Bridge 都可以叫似少少樣,「只不過」喺 FPU 有自己 issue queue
得呢張咁 high-level 嘅 PR 圖我睇唔出啲乜
連 load/store queue size, instruction window ...
原帖由 Puff 於 2015/4/29 21:31 發表
新一輪
2016
Summit Ridge 14nm, 8C Zen CPU, FM3
Bistrol Ridge 14nm, 4C Zen APU
Basilisk 14nm, 2C Zen APU
Styx 14nm, 2C K12 APU
"placement of boxes intended to represent first year of production shi ...
原帖由 qcmadness 於 2015-4-30 12:41 發表
http://www.anandtech.com/bench/product/287?vs=836
3.4 / 3.8GHz vs 3.5 / 3.9GHz
General IPC差10-20%
原帖由 dom 於 2015-4-30 20:24 發表
Phenom II 到 Faildozer 改(Piledriver(/XV)
係跌....efficiency 低, latency 大增, Intel 依家D core 加埋有 iGPU , 粒 iGPU 係share L3...
佢想推 iGPU performance 唔谷呢度比你用 DDR4 都係喂唔切 ...
原帖由 qcmadness 於 2015-4-30 19:39 發表
除非Zen果6條integer pipline係full pipeline (execution + load / store), 唔係Haswell在資源上一定較多
但係如果真係6條full in ...
原帖由 Puff 於 2015-5-1 00:13 發表
最多多一個 port 俾 ALU (3+3),同埋按傳統 store data bus 冇獨立 issue port...
但每個 port 個 stack 喺點就真喺打個問號
你話寫到明 256-bit FMAC X2 就叫細節啫
6 pipeline 學你咁講好多餘地
3+3 / 4+2, split/ ...
原帖由 dom 於 2015-5-1 01:13 發表
A 仔 Inter-Core bandwidth 同效率到今時今日都係追唔近 Intel (唔係最新個代, 我用翻 Sandy Bridge 黎比了)
Latency 又高 (Faildozer 架構先天缺憾尤甚)
最大鑊係 A 仔自己主推既 APU (iGPU) 正正最需要 Bandwi ...
原帖由 dom 於 2015-5-1 01:13 發表
A 仔 Inter-Core bandwidth 同效率到今時今日都係追唔近 Intel (唔係最新個代, 我用翻 Sandy Bridge 黎比了)
Latency 又高 (Faildozer 架構先天缺憾尤甚)
最大鑊係 A 仔自己主推既 APU (iGPU) 正正最需要 Bandwi ...
原帖由 Puff 於 2015-5-1 15:48 發表
FPU 已經唔喺點 symmetric
唔知啦,但 3 AGU 實冇死,最少對得上 (應該唔會冇嘅) 2 load + 1 store per cycle
話唔定 Zen 喺 3/3, K12 喺 4/4
K12 冇 256-bit SIMD 又喺 "wider engine" 嘛
...
原帖由 Puff 於 2015-5-1 16:39 發表
16B/clk per L2 bank 唔算少掛?又唔喺冇得升級到 full cache line per clock per bank
原帖由 Puff 於 2015-5-1 16:27 發表
邊粒有 4 L/S? BDW 都喺 2 load+1 store 咋喎
喺 Power8 先喺 up to 4 loads / clk
差啲睇漏眼
張圖寫嘅喺 "Integer Scheduler", non-plural form
即喺唔喺 split AG sched
unified sched 就多嘢玩啦 可 ...
原帖由 Puff 於 2015/5/5 11:18 發表
alert: the recent slides might be faked... per AMD.
truth comes in Thursday's midnight
歡迎光臨 HKSpot (https://bbs.hk-spot.com/) | Powered by Discuz! 6.0 Lite |