打印

[硬件] What comes after Piledriver?

引用:
原帖由 Puff 於 2012-4-15 19:06 發表

Paging, Syscall 同埋 Virtual Function 咋喎。Target ISA 都已經唔同。
virtual function... 已經夠你麻煩

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:06 發表

virtual function... 已經夠你麻煩
但我唔覺得同 power efficiency 有關係,同 execution time 有關姐。換句話講,x86 同 ARM 咪又係一樣 support standard C/C++.

[ 本帖最後由 Puff 於 2012-4-15 19:12 編輯 ]

TOP

引用:
原帖由 Puff 於 2012-4-15 19:10 發表

但我唔覺得同 power efficiency 有關係,同 execution time 有關姐。換句話講,x86 同 ARM 咪又係一樣 support standard C/C++.
但係同樣transistor的效能差好遠

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:12 發表

但係同樣transistor的效能差好遠
但我真係覺得唔關事,Support C/C++0x 唔代表 GPU 要支援所有 legacy instructions 或者本來由 CPU run 既 low-level instructions。正如你 malloc 都會係 allocate 完你先開始跑 kernel 一樣咋嘛。

舉個例 virtual function,麻煩既只係 OOP implementation in assembly. GCN 已經有 Scalar Unit 做 offload。除非你所有 work-item 全部跳唔同 path,咁就真係 extremely inefficient 啦。但呢種 case... 係都用 CPU 跑啦。

TOP

引用:
原帖由 Puff 於 2012-4-15 19:18 發表
但我真係覺得唔關事,Support C/C++0x 唔代表 GPU 要支援所有 legacy instructions 或者本來由 CPU run 既 low-level instructions。正如你 malloc 都會係 allocate 完你先開始跑 kernel 一樣咋嘛。
舉個例 virtual ...
問題係某2間廠想將所有野用GPU跑

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:20 發表

問題係某2間廠想將所有野用GPU跑
一間咋喎。

TOP

引用:
原帖由 Puff 於 2012-4-15 19:20 發表

一間咋喎。
2間

NVIDIA: GPU兼跑CPU code
Intel: CPU arch當GPU, 行CPU/GPU code

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:22 發表

2間

NVIDIA: GPU兼跑CPU code
Intel: CPU arch當GPU, 行CPU/GPU code
Intel... Larrabee. Hmmm... 呢個條件符合. 但佢唔係 GPU with CPU ISA,佢係 Many streamlined CPU cores running graphics pipeline,同 AMD/Nvidia 有本質上既分別。GenX Graphics 就係同 Nvidia/AMD 一樣既路。

至於 Nvidia,Nvidia 係想將所有野比 GPU 跑,但佢地係想將絕大部份 normal application 可以 parallelizable 既部份 offload 比 GPU。如果唔係開發 Denver,Echelon 依然有 Latency-optimized Core 係用黎做乜呢。

[ 本帖最後由 Puff 於 2012-4-15 19:28 編輯 ]

TOP

引用:
原帖由 Puff 於 2012-4-15 19:25 發表
Intel... Larrabee. Hmmm... 呢個條件符合. 但佢唔係 GPU with CPU ISA,佢係 Many streamlined CPU cores running graphics pipeline,同 AMD/Nvidia 有本質上既分別。GenX Graphics 就係同 Nvidia/AMD 一樣既路。
至 ...
所以Intel GPU的問題遠遠比NVIDIA的嚴重...

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:28 發表

所以Intel GPU的問題遠遠比NVIDIA的嚴重...
但我講緊 AMD 咋喎

TOP

引用:
原帖由 Puff 於 2012-4-15 19:30 發表

但我講緊 AMD 咋喎
AMD其實根本唔好攪咁多呢類野, INT留番比CPU好了

GPU要support既野愈多, 就會愈inefficient

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:32 發表

AMD其實根本唔好攪咁多呢類野, INT留番比CPU好了

GPU要support既野愈多, 就會愈inefficient
AMD 搞緊既係 Better Programmability on GPU. Virtual Function, Exception Handling, Syscall, x86 Paging Support 諸如此類.

你講果堆 "Integer" Workload,或者話 Low-level System Feature e.g. memory management, I/O transaction GPU 大概唔會做。Syscall CPU 黎做咪得。Page fault 又係 OS 去 handle 既。OS 依然跑在 CPU 上喎。

Integer on GPU 又唔係無用既,好似 X264 呢類 Video Transcoding 咪用得著 Integer @ GPU。



[ 本帖最後由 Puff 於 2012-4-15 19:39 編輯 ]

TOP

引用:
原帖由 Puff 於 2012-4-15 19:36 發表

AMD 搞緊既係 Better Programmability on GPU. Virtual Function, Exception Handling, Syscall, x86 Paging Support 諸如此類.

你講果堆 "Integer" Workload,或者話 Low-level System Feature e.g. memory managem ...
呢d野都唔駛用GPU做, bulldozer就係設計來CPU handle一部分, GPU handle一部分

TOP

引用:
原帖由 qcmadness 於 2012-4-15 19:39 發表

呢d野都唔駛用GPU做, bulldozer就係設計來CPU handle一部分, GPU handle一部分
I object. 呢 D 野係 GPU 無得唔做,如果你要跑 standard code as you run on CPU 既話。
GPU 同 CPU integrate 又係一件唔可能既事。

[ 本帖最後由 Puff 於 2012-4-15 19:43 編輯 ]

TOP

引用:
原帖由 Puff 於 2012-4-15 19:42 發表

I object. 呢 D 野係 GPU 無得唔做,如果你要跑 standard code as you run on CPU 既話。
GPU 同 CPU integrate 又係一件唔可能既事。
decouple INT同FP就係最佳證明
咁會增加latency, 如果唔係為左掉FP俾GPU做, 我諗唔到有咩理由一定要shared FP

http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333
引用:
Like all previous designs from AMD (and in contrast to Intel), Bulldozer separates the integer and floating point schedulers, register files and execution units. In proof that Sutherland’s Wheel of Reincarnation applies to more than just graphics, Bulldozer employs a co-processor model for floating point and SIMD execution that is shared by both cores in a module – reminiscent of the days when x87 floating point co-processors would reside on a separate chip altogether. One advantage of this more formalized separation is that the floating point cluster might eventually be replaced or supplemented by a GPU shader array, an evolution of Bulldozer to fit the ‘Fusion’ mold. This co-processor model is an example of a substantial change that is also familiar from previous AMD CPUs, the resemblance is clear from Figure 4 below.

TOP