‹‹ 上一主題 | 下一主題 ››

打印

[硬件] First Details of Steamroller

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

1^# 大中小發表於 2012-8-30 01:53 顯示全部帖子

係咪 4-way 真係唔知，除左 anandtech 之外無乜人有提係 4-way。

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

2^# 大中小發表於 2012-8-30 02:00 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 01:55 發表

2-way一定太少
有可能3, 有可能4, 以AMD一貫做法, 4似過3 (see K7/K8/K10)

不過3其實好過4, 只係AMD唔慣咁設計

係 power efficiency standpoint 就係 3-wide 囉。又有 decoded micro-op queue 喎...
不過 FPU 點運作又係個好問題，而且 FPU 疑似縮左下水，由 4 pipes 變做 3 pipes. 雖然我個人估 SIMD ALU 由兩組變三組啦。

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

3^# 大中小發表於 2012-8-30 02:03 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 02:02 發表

計埋有6個port差唔多

唔放咁多資源落MMX係好事, 起碼MMX/3D-Now/x87可以被SSE2完全取代

MMX Unit 即係 MAL pipe，即係 Vector Integer Arithmetic, Logical Ops + Bitwise Ops... 你 Cut MMX 等於 Cut SSE Integer.

[ 本帖最後由 Puff 於 2012-8-30 02:06 編輯 ]

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

4^# 大中小發表於 2012-8-30 02:10 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 02:06 發表

唔係的, 個FMA unit做哂SSEx的野

Nononono.

P0 有 FMA, FCVT 同 IMAC 三個執行單元. P1 有 FMA 同 XBAR 兩個執行單元. P2 & P3 就得 MAL 執行單元.
FMA 係做所有 floating-point arithmetic & logical operations e.g. CMPPS, ADDPS, MULPS，但唔包括 bitwise operations e.g. ORPS, ANDPS.
MAL 就係做所有 integer arithmetic & logical operations，同時做埋 floating-point 既 bitwise operations. 不過 MUL/MAC 係 IMAC 既事。

唔係我老作，Optimization Guide 是如此寫的，很出名的老外 Agner 也是如此說的。

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

5^# 大中小發表於 2012-8-30 02:13 顯示全部帖子

唉呀，你引得 SoG 就睇下 instruction latency table 啦。雖然話唔係完全準，但係 P***D/Q/W/SW/B 既 integer arithmetic instructions 絕大部份都係歸 MAL0 同 MAL1.
你引用果個 section 個 description 更加有講 In addition to the two FMACs, the FPU also contains two 128-bit integer units which perform arithmetic and logical operations on AVX, MMX and SSE packed integer data. 啦

[ 本帖最後由 Puff 於 2012-8-30 02:17 編輯 ]

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

6^# 大中小發表於 2012-8-30 02:20 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 02:17 發表

睇緊, 應該你對

不過既然Intel都reuse 128-bit unit做256-bit野
AMD都有可能係reuse FMAC做ALU野

Intel 有無 reuse integer vector unit 做 256-bit 野我唔知，我只知道佢係 reuse integer datapath for operand MSB delivery.
而所有 FP unit 都是 256-bit width 的。

CPU 跟 GPU 不一樣，CPU 因為 subword parallelism & explicit vector ISA 的關係，大概是沒那個條件可以玩 reuse.
依家就係 floating-point bitwise 用 integer unit 做。e.g. PAND, POR

[ 本帖最後由 Puff 於 2012-8-30 02:22 編輯 ]

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

7^# 大中小發表於 2012-8-30 02:29 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 02:23 發表

暫時來講, 256-bit width未係需要

講番SR, 如果最終係P2同P3二合為一, 應該係睇左software utilization過低 (上次咁做係VLIW-5 > VLIW-4),
先會咁cut, 因為一用到, 其實penalty應該唔細, 雖則話係deep pipelined F ...

真心唔覺低，我嫌佢 integer pipe 少添呀

x264 IPC=2.0 inst.latency=2 仲可以 low utilization 就真係食香蕉.
diagram 都是拿來騙人的，正如張圖都無同你講佢有 crossbar unit.

[ 本帖最後由 Puff 於 2012-8-30 02:32 編輯 ]

TOP

Puff

水王

Rank: 4 Rank: 4 Rank: 4 Rank: 4

PM
加為好友
當前離線

8^# 大中小發表於 2012-8-30 02:35 顯示全部帖子

引用:

原帖由 qcmadness 於 2012-8-30 02:32 發表

如果真係高, 就唔會提議話cut MMX pipeline
或者係繼續P0-P3, 但係P3得番FMISC/FSTO, P2繼續MMX

引用:

There’s no change in the execution capabilities of the FPU, but there’s a reduction in overall area. The MMX unit now shares some hardware with the 128-bit FMAC pipes.

Was it cut? Well, there is no certain answer until the pipe mapping is out.
"shares some hardware" could be sharing the issue port.

[ 本帖最後由 Puff 於 2012-8-30 02:36 編輯 ]