打印

[硬件] What comes after Piledriver?

引用:
原帖由 Puff 於 2012-4-15 21:55 發表

唔關時間既問題事,而係你出發既角度就係 CPU 同 GPU 最終會 Tightly Fuse 埋一齊。
但事實上無人咁講過。

Core-level integration 果種喎。

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:01 發表

http://www.xbitlabs.com/news/cpu ... ue_in_2015_AMD.html

Marketing.
佢點 Fusion 都係佢既事,我亦都無 object 過 CPU integrate GPU 既 "controller" 既可能性 (或者 ACE kind of thing,as you like)。AMD 都有 Patent 描述過類似既 implementation。我 object 既係 Replace FPU with GPU.

TOP

引用:
原帖由 Puff 於 2012-4-15 22:04 發表

Marketing.
佢點 Fusion 都係佢既事,我亦都無 object 過 CPU integrate GPU 既 "controller" 既可能性 (或者 ACE kind of thing,as you like)。AMD 都有 Patent 描述過類似既 implementation。我 object 既係 Replace ...
integrate GPU做fusion, 而家都做到啦

唔需要external ISA, 用番x86就用到CPU+GPU先係買ATi的價值

要chipset, SiS都做到啦
只係要graphics expert, VIA都做到啦

點解要咁大既ATi?

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:04 發表

*pics*
我睇過,咁點?重點係呢堆 Slides 可以佐證到你講既乜野。
Heterogeneous Computing 係 many different kind of cores working together,唔係 fuse all cores together and become one.

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:05 發表

integrate GPU做fusion, 而家都做到啦

唔需要external ISA, 用番x86就用到CPU+GPU先係買ATi的價值

要chipset, SiS都做到啦
只係要graphics expert, VIA都做到啦

點解要咁大既ATi? ...
咁請問要 HSA IL 黎做乜?點解要用 x86 跑 GPU?呢個係你個人意見咋喎。

TOP

引用:
原帖由 Puff 於 2012-4-15 22:06 發表

我睇過,咁點?重點係呢堆 Slides 可以佐證到你講既乜野。
Heterogeneous Computing 係 many different kind of cores working together,唔係 fuse all cores together and become one. ...
咁而家都已經係 (Llano / SandyBridge), 點解仲要evolve?
你要明白要有效咁用埋GPU, OpenCL都睇得出唔會點流行, 唔merge佢地一齊用係大部分浪費

TOP

引用:
原帖由 Puff 於 2012-4-15 22:07 發表

咁請問要 HSA IL 黎做乜?點解要用 x86 跑 GPU?呢個係你個人意見咋喎。
係用埋GPU來跑x86 instruction

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:08 發表


咁而家都已經係 (Llano / SandyBridge), 點解仲要evolve?
你要明白要有效咁用埋GPU, OpenCL都睇得出唔會點流行, 唔merge佢地一齊用係大部分浪費
有效地運用 GPU 唔代表要 Merge 埋佢地一齊,OpenCL 流唔流行一件事,HSA 既出現就係為左你所講既野。
但係同 FPU replace with GPU 無乜關係喎。將 Speedy + Tightly Coupled to CPU pipeline 既 FPU 取代成 Slow + Loosely Coupled to CPU pipeline 既 GPU 係我最唔明白既事,呢兩個本來就係唔同既存在。完全唔同既 execution model.

HSA 既 purpose 就係 main path on CPU, tasks on either CPU and GPU.
You can still run branchy vector code on CPU with the FP unit, and run embarrassingly parallelizable code on GPU.

TOP

引用:
原帖由 Puff 於 2012-4-15 22:12 發表


有效地運用 GPU 唔代表要 Merge 埋佢地一齊,OpenCL 流唔流行一件事,HSA 既出現就係為左你所講既野。
但係同 FPU replace with GPU 無乜關係喎。將 Speedy + Tightly Coupled to CPU pipeline 既 FPU 換做 Slow + Loosely Coupled to CPU pipeline ...
AMD的FPU不嬲都同CPU有少少decoupled, Intel就真係tightly coupled


FP instruction本身就係high latency, 所以問題無咁大

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:15 發表

AMD的FPU不嬲都同CPU有少少decoupled, Intel就真係tightly coupled
*pic*
佢 decoupled from the integer pipeline,但係依然係 a part of the CPU pipeline. 依然會 handshake with integer core.

TOP

引用:
原帖由 Puff 於 2012-4-15 22:16 發表

佢 decoupled from the integer pipeline,但係依然係 a part of the CPU pipeline. 依然會 handshake with integer core.
其實幾乎唔會, 除左load memory

如果你記得, K8既pipeline係12 (INT) / 17 (FP) stages

TOP

引用:
原帖由 qcmadness 於 2012-4-15 22:17 發表

其實幾乎唔會, 除左load memory
No. Instruction Retire.

TOP

引用:
FP instruction本身就係high latency, 所以問題無咁大
呢個唔係理由。FP instructions 高極有限,back-to-back 都係 max 6 cycles 最低 2 cycles. Offload 去 GPU 呢?唔計中間一大堆野,GPU 既 frequency 同 CPU 已經有個落差,再加埋 4-cycle back-to-back issue...

TOP

引用:
原帖由 Puff 於 2012-4-15 22:17 發表

No. Instruction Retire.
所以full CPU+GPU fusion要差唔多10年先有
咁易做就唔會要做10年, 包括Intel

TOP

引用:
原帖由 Puff 於 2012-4-15 22:18 發表

呢個唔係理由。FP instructions 高極有限,back-to-back 都係 max 6 cycles 最低 2 cycles. Offload 去 GPU 呢?唔計中間一大堆野,GPU 既 frequency 同 CPU 已經有個落差,再加埋 4-cycle back-to-back issue... ...
...

一個sqrt已經29-38 cycle latency (Family 15h)

TOP