打印

[業界消息] AMD執左佢吧喇

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

16^# 大中小發表於 2015-4-29 22:41 只看該作者

望到呢張圖, 就知AMD唔夠ambition

睇個死樣, 最多咪3-4 instruction fetch per clock
人地講緊4+1, 你以為execution power勁好多咩

http://bbs.hk-spot.com

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

17^# 大中小發表於 2015-4-29 23:14 只看該作者

4+1? uop fusion? bulldozer 有呀. 32B/clk fetch 添
有冇 uop cache 就唔知，但 SR 有 uop loop buffer

[ 本帖最後由 Puff 於 2015-4-29 23:16 編輯 ]

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

18^# 大中小發表於 2015-4-29 23:17 只看該作者

引用:

原帖由 Puff 於 2015-4-29 23:14 發表
4+1? uop fusion? bulldozer 有呀. 32B/clk fetch 添
有冇 uop cache 就唔知，但 SR 有 uop loop buffer

無加1

http://bbs.hk-spot.com

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

19^# 大中小發表於 2015-4-29 23:19 只看該作者

引用:

原帖由 qcmadness 於 2015-4-29 23:17 發表

無加1

有 branch fusion

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

20^# 大中小發表於 2015-4-29 23:22 只看該作者

引用:

原帖由 Puff 於 2015-4-29 23:19 發表

有 branch fusion

branch fusion唔會令你做多1條insruction, 最多係小d branch miss

http://bbs.hk-spot.com

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

21^# 大中小發表於 2015-4-29 23:24 只看該作者

引用:

原帖由 qcmadness 於 2015-4-29 23:22 發表

branch fusion唔會令你做多1條insruction, 最多係小d branch miss

我錯 AMD 個 branch fusion 喺陽春過 Intel
但 branch fusion 都喺 take 5 instruction and decode into 4 complex ops
effectively 都叫喺 5 inst

intel 除咗 uop cache 6? uop/clk 外真喺唔知邊樹有 4+1

[ 本帖最後由 Puff 於 2015-4-29 23:26 編輯 ]

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

22^# 大中小發表於 2015-4-29 23:26 只看該作者

引用:

原帖由 Puff 於 2015-4-29 23:24 發表

我錯 AMD 個 branch fusion 喺陽春過 Intel
但 branch fusion 都喺 take 5 instruction and decode into 4 complex ops

wrong

http://www.anandtech.com/show/50 ... lving-even-deeper/2

引用:

However AMD decided to introduce this kind of fusion in Bulldozer later in the decoding pipeline than Intel, where x86 branch fusion is already present in the predecoding phases. The result is that the decoding bandwidth of all Intel CPUs since Nehalem has been up to five (!) x86-64 instructions, while x86 branch fusion does not increase the maximum decode rate of a Bulldozer module.

http://bbs.hk-spot.com

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

23^# 大中小發表於 2015-4-29 23:28 只看該作者

引用:

原帖由 qcmadness 於 2015-4-29 23:26 發表

wrong

http://www.anandtech.com/show/50 ... lving-even-deeper/2

咁有趣？睇下 SOG 先
p.s. sr 有 40 entry post-decode uop buffer

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

24^# 大中小發表於 2015-4-30 01:41 只看該作者

係咪好似樣呢

我睇唔到AMD走得出IPC慢過Intel的方法
K10, 我當你IPC再加50%, 先叫追到Haswell
以而家AMD integer pipeline的performance, 好難做到lor
FP反而有d機會既 (2個full 256-bit FMA FPU)

http://www.anandtech.com/bench/product/435?vs=1368

不過full inclusive cache就更有趣

http://bbs.hk-spot.com

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

25^# 大中小發表於 2015-4-30 02:41 只看該作者

引用:

原帖由 qcmadness 於 2015-4-30 01:41 發表
係咪好似樣呢

我睇唔到AMD走得出IPC慢過Intel的方法
K10, 我當你IPC再加50%, 先叫追到 ...

你拼埋 APPL Cyclone 成個餅印添啦
Sandy Bridge 呀 Ivy Bridge 都可以叫似少少樣，「只不過」喺 FPU 有自己 issue queue

得呢張咁 high-level 嘅 PR 圖我睇唔出啲乜
連 load/store queue size, instruction window size 都冇
最多就估唔會衰得過肥龍，唔會連四年前嘅 SNB 都打唔低呢啲行貨

[ 本帖最後由 Puff 於 2015-4-30 02:53 編輯 ]

TOP

dom

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

PM
加為好友
當前在線

26^# 大中小發表於 2015-4-30 02:41 只看該作者

A 仔個 L3 同 Inter-Core Bandwidth 唔知有無改善...... Phenom II 都係呢樣唔夠, 去到 Faildozer 重衰d

加翻 L3 你比個慢既又係拖死

天然系長髮眼鏡娘最高
Lucky Star 聯盟 - 美幸
Kancolle - 大淀, 翔鶴 (太太), 烏海 , 瑞鶴

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

27^# 大中小發表於 2015-4-30 02:48 只看該作者

引用:

原帖由 dom 於 2015-4-30 02:41 發表
A 仔個 L3 同 Inter-Core Bandwidth 唔知有無改善...... Phenom II 都係呢樣唔夠, 去到 Faildozer 重衰d
加翻 L3 你比個慢既又係拖死

Inclusive L3 應該冇死，天生 snoop filter
而且快嘅 shared banked cache 唔算喺冇經驗 (25 clk, half speed data array)
只不過喺 low-power jaguar

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

28^# 大中小發表於 2015-4-30 05:26 只看該作者

Fiji, The World’s First Graphics Processor With 2.5D High Bandwidth Memory
HotChips 27 Conf. Day 2

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

29^# 大中小發表於 2015-4-30 12:21 只看該作者

引用:

原帖由 Puff 於 2015-4-30 02:41 發表

你拼埋 APPL Cyclone 成個餅印添啦
Sandy Bridge 呀 Ivy Bridge 都可以叫似少少樣，「只不過」喺 FPU 有自己 issue queue

得呢張咁 high-level 嘅 PR 圖我睇唔出啲乜
連 load/store queue size, instruction window ...

好明顯你無睇details

http://bbs.hk-spot.com