‹‹ 上一主題 | 下一主題 ››

21 12 ››

發新話題

打印

[業界消息] AMD執左佢吧喇

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

1^# 大中小發表於 2015-4-23 22:20 顯示全部帖子

引用:

原帖由 dom 於 2015-4-23 13:10 發表
GPU 都整到咁睇黎 Roy Read 炒到過晒火...
A 卡明明有優勢有競爭力的........

叫左你唔好買A記股票架啦

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

2^# 大中小發表於 2015-4-29 22:41 顯示全部帖子

望到呢張圖, 就知AMD唔夠ambition

睇個死樣, 最多咪3-4 instruction fetch per clock
人地講緊4+1, 你以為execution power勁好多咩

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

3^# 大中小發表於 2015-4-29 23:17 顯示全部帖子

引用:

原帖由 Puff 於 2015-4-29 23:14 發表
4+1? uop fusion? bulldozer 有呀. 32B/clk fetch 添
有冇 uop cache 就唔知，但 SR 有 uop loop buffer

無加1

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

4^# 大中小發表於 2015-4-29 23:22 顯示全部帖子

引用:

原帖由 Puff 於 2015-4-29 23:19 發表

有 branch fusion

branch fusion唔會令你做多1條insruction, 最多係小d branch miss

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

5^# 大中小發表於 2015-4-29 23:26 顯示全部帖子

引用:

原帖由 Puff 於 2015-4-29 23:24 發表

我錯 AMD 個 branch fusion 喺陽春過 Intel
但 branch fusion 都喺 take 5 instruction and decode into 4 complex ops

wrong

http://www.anandtech.com/show/50 ... lving-even-deeper/2

引用:

However AMD decided to introduce this kind of fusion in Bulldozer later in the decoding pipeline than Intel, where x86 branch fusion is already present in the predecoding phases. The result is that the decoding bandwidth of all Intel CPUs since Nehalem has been up to five (!) x86-64 instructions, while x86 branch fusion does not increase the maximum decode rate of a Bulldozer module.

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

6^# 大中小發表於 2015-4-30 01:41 顯示全部帖子

係咪好似樣呢

我睇唔到AMD走得出IPC慢過Intel的方法
K10, 我當你IPC再加50%, 先叫追到Haswell
以而家AMD integer pipeline的performance, 好難做到lor
FP反而有d機會既 (2個full 256-bit FMA FPU)

http://www.anandtech.com/bench/product/435?vs=1368

不過full inclusive cache就更有趣

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

7^# 大中小發表於 2015-4-30 12:21 顯示全部帖子

引用:

原帖由 Puff 於 2015-4-30 02:41 發表

你拼埋 APPL Cyclone 成個餅印添啦
Sandy Bridge 呀 Ivy Bridge 都可以叫似少少樣，「只不過」喺 FPU 有自己 issue queue

得呢張咁 high-level 嘅 PR 圖我睇唔出啲乜
連 load/store queue size, instruction window ...

好明顯你無睇details

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

8^# 大中小發表於 2015-4-30 12:41 顯示全部帖子

引用:

原帖由 cheungmanhoi 於 2015-4-30 12:34 發表
其實snb到boardwell ipc升左好多咩?

http://www.anandtech.com/bench/product/287?vs=836

3.4 / 3.8GHz vs 3.5 / 3.9GHz

General IPC差10-20%

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

9^# 大中小發表於 2015-4-30 14:04 顯示全部帖子

引用:

原帖由 Puff 於 2015-4-30 13:41 發表

有乜 details 可言？除咗 cache hierarchy

張圖寫左好多野

有幾多FPU / ALU, 一目了然

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

10^# 大中小發表於 2015-4-30 19:24 顯示全部帖子

引用:

原帖由 XT 於 2015-4-30 18:55 發表

直頭退步

Pilediver大約係K8的IPC
Steamroller太約係K10 (唔係K10.5) 的IPC

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

11^# 大中小發表於 2015-4-30 19:39 顯示全部帖子

除非Zen果6條integer pipline係full pipeline (execution + load / store), 唔係Haswell在資源上一定較多

但係如果真係6條full integer pipeline, Haswell就望塵莫及

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

12^# 大中小發表於 2015-5-1 02:20 顯示全部帖子

引用:

原帖由 cheungmanhoi 於 2015-5-1 02:19 發表
L3 慢極都會快過RAM嘅
如果你個L3仲慢過RAM 就真係抵執喇

slower

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

13^# 大中小發表於 2015-5-1 02:20 顯示全部帖子

引用:

原帖由 Puff 於 2015-5-1 00:13 發表

最多多一個 port 俾 ALU (3+3)，同埋按傳統 store data bus 冇獨立 issue port...
但每個 port 個 stack 喺點就真喺打個問號
你話寫到明 256-bit FMAC X2 就叫細節啫

6 pipeline 學你咁講好多餘地
3+3 / 4+2, split/ ...

而家講緊AMD, 係中意symmetric pipeline既公司

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

14^# 大中小發表於 2015-5-1 16:23 顯示全部帖子

引用:

原帖由 Puff 於 2015-5-1 15:59 發表

jaguar 個 half-speed quad-banked L2 "都喺" 25 clk

bandwidth小呢

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

15^# 大中小發表於 2015-5-1 16:24 顯示全部帖子

引用:

原帖由 Puff 於 2015-5-1 15:48 發表

FPU 已經唔喺點 symmetric
唔知啦，但 3 AGU 實冇死，最少對得上 (應該唔會冇嘅) 2 load + 1 store per cycle

話唔定 Zen 喺 3/3, K12 喺 4/4
K12 冇 256-bit SIMD 又喺 "wider engine" 嘛
...

咁咪慢lor
你咁都唔明, 人地Intel講緊4L/S

http://bbs.hk-spot.com

TOP

21 12 ››

當前時區 GMT+8, 現在時間是 2026-3-14 11:14

清除 Cookies - 聯繫我們 - 幫助 - 界面風格

Processed in 0.013857 Second(s), 7 Queries, Gzip enabled.