HKSpot » 電子玩物
原帖由 dom 於 2015-4-23 13:10 發表 GPU 都整到咁 睇黎 Roy Read 炒到過晒火... A 卡明明有優勢有競爭力的........
原帖由 Puff 於 2015-4-29 23:14 發表 4+1? uop fusion? bulldozer 有呀. 32B/clk fetch 添 有冇 uop cache 就唔知,但 SR 有 uop loop buffer
原帖由 Puff 於 2015-4-29 23:19 發表 有 branch fusion
原帖由 Puff 於 2015-4-29 23:24 發表 我錯 AMD 個 branch fusion 喺陽春過 Intel 但 branch fusion 都喺 take 5 instruction and decode into 4 complex ops
However AMD decided to introduce this kind of fusion in Bulldozer later in the decoding pipeline than Intel, where x86 branch fusion is already present in the predecoding phases. The result is that the decoding bandwidth of all Intel CPUs since Nehalem has been up to five (!) x86-64 instructions, while x86 branch fusion does not increase the maximum decode rate of a Bulldozer module.
原帖由 Puff 於 2015-4-30 02:41 發表 你拼埋 APPL Cyclone 成個餅印添啦 Sandy Bridge 呀 Ivy Bridge 都可以叫似少少樣,「只不過」喺 FPU 有自己 issue queue 得呢張咁 high-level 嘅 PR 圖我睇唔出啲乜 連 load/store queue size, instruction window ...
原帖由 cheungmanhoi 於 2015-4-30 12:34 發表 其實snb到boardwell ipc升左好多咩?
原帖由 Puff 於 2015-4-30 13:41 發表 有乜 details 可言?除咗 cache hierarchy
原帖由 XT 於 2015-4-30 18:55 發表 直頭退步
原帖由 cheungmanhoi 於 2015-5-1 02:19 發表 L3 慢極都會快過RAM嘅 如果你個L3仲慢過RAM 就真係抵執喇
原帖由 Puff 於 2015-5-1 00:13 發表 最多多一個 port 俾 ALU (3+3),同埋按傳統 store data bus 冇獨立 issue port... 但每個 port 個 stack 喺點就真喺打個問號 你話寫到明 256-bit FMAC X2 就叫細節啫 6 pipeline 學你咁講好多餘地 3+3 / 4+2, split/ ...
原帖由 Puff 於 2015-5-1 15:59 發表 jaguar 個 half-speed quad-banked L2 "都喺" 25 clk
原帖由 Puff 於 2015-5-1 15:48 發表 FPU 已經唔喺點 symmetric 唔知啦,但 3 AGU 實冇死,最少對得上 (應該唔會冇嘅) 2 load + 1 store per cycle 話唔定 Zen 喺 3/3, K12 喺 4/4 K12 冇 256-bit SIMD 又喺 "wider engine" 嘛 ...