打印

[硬件] 睇番Netburst Architecture

睇番Netburst Architecture

http://www.intel.com/Assets/en_US/PDF/manual/248966.pdf

呢個係致命傷
引用:
Port 0. In the first half of the cycle, port 0 can dispatch either one floating-point
move μop (a floating-point stack move, floating-point exchange or floating-point
store data) or one arithmetic logical unit (ALU) μop (arithmetic, logic, branch or store
data). In the second half of the cycle, it can dispatch one similar ALU μop.

Port 1. In the first half of the cycle, port 1 can dispatch either one floating-point
execution (all floating-point operations except moves, all SIMD operations) μop or
one normal-speed integer (multiply, shift and rotate) μop or one ALU (arithmetic)
μop. In the second half of the cycle, it can dispatch one similar ALU μop.
1個clock可以做既野
a. 2x 2 similar ADD/SUB
b. 2 similar ADD/SUB + 2 similar ADD/SUB/Logic Ops/LS Ops
c. 1 FP/SIMD Op + 1 FP LS/Move Op
d. 2 similar ADD/SUB/Logic Ops/LS Ops + Shift/Rotate

唔怪得之好慢啦... 唔同類型的code, 加埋得2個operations...

Intel optimizations仲講到明用ADD去取代IMUL...
附件: 您所在的用戶組無法下載或查看附件

TOP

引用:
原帖由 qcmadness 於 2011-9-2 14:29 發表
http://www.intel.com/Assets/en_US/PDF/manual/248966.pdf

呢個係致命傷

16519



1個clock可以做既野
a. 2x 2 similar ADD/SUB
b. 2 similar ADD/SUB + 2 similar ADD/SUB/Logic Ops/LS Ops
c. 1 FP/SIM ...
但係Nehalem某程度都係Netburst based.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 15:57 發表

但係Nehalem某程度都係Netburst based.
好明顯唔係...
major execution都係跟Core

http://www.realworldtech.com/pag ... 40208182719&p=6



你望到同Core幾乎一樣

TOP

引用:
原帖由 qcmadness 於 2011-9-2 15:58 發表

好明顯唔係...

major execution都係跟Core
咁Atom呢?
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 16:01 發表

咁Atom呢?
Bonnell係全新設計

TOP

引用:
原帖由 qcmadness 於 2011-9-2 15:58 發表

好明顯唔係...
major execution都係跟Core

http://www.realworldtech.com/pag ... 40208182719&p=6



你望到同Core幾乎一樣 ...
Mm........好似純粹Core 2進化版
ロストックで風を攫うや思い出す

TOP

Bonnell


Bobcat


好明顯bonnell同bobcat都係得2-way wide execution

TOP

引用:
原帖由 qcmadness 於 2011-9-2 16:09 發表
Bonnell


Bobcat


好明顯bonnell同bobcat都係得2-way wide ...
2-way wide = 2x FP + 2x INT?
咁Barcelona/Core 2/Nehalem = 3-way?
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 16:15 發表

2-way wide = 2x FP + 2x INT?
咁Barcelona/Core 2/Nehalem = 3-way?
no...

Bonnell同Bobcat都係只係每個issue到2個instruction
K10係3個
Core / Nehalem都係3 simple + 1 complex (complex可以拆開做2 simple)

TOP

引用:
原帖由 qcmadness 於 2011-9-2 16:16 發表

no...

Bonnell同Bobcat都係只係每個issue到2個instruction
K10係3個
Core / Nehalem都係3 simple + 1 complex (complex可以拆開做2 simple)
instruction呢到係uOps?
K10 => 3INT + 3FP,平行
Core/Nehalem => 3 INT/FP Mixed (唔知一個port可唔可以同時做兩樣野,應該唔得)
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 16:26 發表

instruction呢到係uOps?
K10 => 3INT + 3FP,平行
Core/Nehalem => 3 INT/FP Mixed (唔知一個port可唔可以同時做兩樣野,應該唔得)
唔係... 普通ADD / MUL之類的instructions

K7/K8/K10都係3組平行
但係其實有好多浪費, 連Bulldozer都唔再咁做

Core/Nehalem/SB都係4+1, 始終有d係Load/Store instruction
而K7/K8/K10/Bulldozer都係跟埋係integer cluster入面
而Core/Nehalem/SB都係同integer cluster分開的 (圖中無的port 2/3/4)

TOP

引用:
原帖由 qcmadness 於 2011-9-2 16:30 發表

唔係... 普通ADD / MUL之類的instructions

K7/K8/K10都係3組平行
但係其實有好多浪費, 連Bulldozer都唔再咁做

Core/Nehalem/SB都係4+1, 始終有d係Load/Store instruction
而K7/K8/K10/Bulldozer都係跟埋係integer ...
見Nehalem同Core都係一個RS排曬隊,所有INT/FP都係曬一齊.
2/3/4原來係有,無Show......
三個計數,三個data.

Bulldozer個FP好似Intel咁排隊,而INT就3條各自排隊?

[ 本帖最後由 Henry 於 2011-9-2 16:47 編輯 ]
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 16:44 發表

見Nehalem同Core都係一個RS排曬隊,所有INT/FP都係曬一齊.
2/3/4原來係有,無Show......

Bulldozer個FP好似Intel咁排隊,而INT就3條各自排隊?
FP排隊, INT都係排隊, 不過symmetric 2條execution / L&S per core/thread

TOP

引用:
原帖由 qcmadness 於 2011-9-2 16:47 發表

FP排隊, INT都係排隊, 不過symmetric 2條execution / L&S per core/thread

Intel一個Port可以同時做兩樣野?
見6個裡面只有三個係計數.

BD我指排隊係一條隊feed三個FP定一條隊一個FP.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-9-2 16:56 發表

Intel一個Port可以同時做兩樣野?
見6個裡面只有三個係計數.

BD我指排隊係一條隊feed三個FP定一條隊一個FP.
no... 1個port只可以做1樣野
所以Core之後的Intel CPU, 有番咁上下memory performance

BD係unified FP scheduler, 1條隊

TOP