打印

[硬件] AMD AFDS-D Webcast + On-demand Playback

AMD AFDS-D Webcast + On-demand Playback

http://www.inxpo.com/events/amd/afds-d/
期待 12 Jun,HSA APU 橫空出世。CPU & GPU cache coherency 呀。

TOP

SB已經做到

TOP

引用:
原帖由 qcmadness 於 2012-6-8 22:00 發表
SB已經做到
I don't care about Intel.
And there is no cache coherency among CPU and GPU ensured in Sandy Bridge, though both share the same LLC. Only the CPU caches will ensure full coherency.

[ 本帖最後由 Puff 於 2012-6-8 22:42 編輯 ]

TOP

http://www.realworldtech.com/pag ... 80811195102&p=9
引用:
Advertisement
Intel's Sandy Bridge Graphics Architecture
By: David Kanter | 08-08-2011
Heterogeneous Integration

The graphics integration in Sandy Bridge is particularly novel as Intel is sharing the LLC with the GPU. The driver allocates regions of the cache at way granularity (128KB) – and can actually request the whole cache. Each thread can spill 32KB of data back to the LLC, for a total of nearly 2MB in the larger 12 shader core variants. Almost any GPU data can be held in the LLC, including vertices, textures and many other types of state.

The Sandy Bridge LLC and ring interconnect can rapidly pass data from the GPU back to the CPU – AMD’s Fusion is a far higher performance GPU, but that particular style of communication is discouraged. Since the GPU has a weaker ordering model, a flush command is needed to force data to be written back to the LLC prior to the CPU reading it. The driver can also allocate a portion of the LLC as a non-coherent cache for display data and other uses. For example, the results of transcoding might be written out to the the non-coherent region.

While this excellent system integration promises many benefits, at present it is restricted mainly to multimedia workloads. For graphics, it is largely an academic advantage to any but Intel’s driver team. The GPU is exposed through graphics APIs; yet neither OpenGL nor DirectX programs can interact with coherent memory and bypass I/O copies (let alone use the LLC). AMD has introduced an OpenCL extension for a zero copy mechanism on Windows systems already, and presumably Intel will follow once they have OpenCL and DirectCompute capable hardware. Intel’s graphics driver can take advantage of fast CPU/GPU communication, but that is only because it has raw access to the GPU hardware. These advances pave the way for Ivy Bridge and certainly promise good things in the future, but also serve to point out some of the deficiencies in the current generation.

TOP

引用:
原帖由 qcmadness 於 2012-6-8 23:00 發表
http://www.realworldtech.com/pag ... 80811195102&p=9

No words there is about coherency.

It's just about how CPU and GPU shares LLC, and how GPU handles shared data. The way SNB handles such data is similar to Llano and Trinity: You want sharing with correctness? Do a flush please. Ivy Bridge even extends the GPU cache hierarchy to 4 levels. So the difference is the formation of cache hierarchy, but none of them get hardware coherency ensured.

And no further discussion on the necessity of hardware coherency among CPU and GPU please. I can foresee that we have a very different value on this topic.



[ 本帖最後由 Puff 於 2012-6-9 22:32 編輯 ]

TOP