打印

[硬件] VR-Zone SB-E Price

引用:
原帖由 Henry 於 2011-8-16 15:43 發表

No OC multi =/= no OC bus.
But it is quite certain that without OC multi, not much gain can be achieved.

TOP

引用:
原帖由 Henry 於 2011-8-16 15:45 發表

Video editing.
Why CUDA can do much faster than CPU, one of the reason is RAM bandwidth.
of course not...

CUDA / Quicksync do faster because they are not flexible

TOP

引用:
原帖由 qcmadness 於 2011-8-16 16:14 發表

of course not...

CUDA / Quicksync do faster because they are not flexible
Like a pair of human hands vs a bunch of robot arms

TOP

引用:
原帖由 qcmadness 於 2011-8-16 16:14 發表

of course not...

CUDA / Quicksync do faster because they are not flexible
simple maths, but many ops which is not much predictable and few can be cached in L2, so bandwidth to RAM is important. (need big fast pool)
QS not to comment, as mechanism is different.
CUDA, if low bandwidth, should also eat C.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-8-16 17:13 發表

simple maths, but many ops which is not much predictable and few can be cached in L2, so bandwidth to RAM is important. (need big fast pool)
QS not to comment, as mechanism is different.
CUDA, if low ...
cuda / qs做得好既, 都係d predictable既code... 你既concept有少少問題

TOP

引用:
原帖由 qcmadness 於 2011-8-16 19:59 發表

cuda / qs做得好既, 都係d predictable既code... 你既concept有少少問題
我講既係Data,唔係code.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-8-16 23:20 發表

我講既係Data,唔係code.
SIMD data唔駛太大的memory bandwidth

predict到, prefetch到既, 根本唔需要大bandwidth
graphics要咁大bandwidth有時係因為predict唔到

TOP

引用:
原帖由 qcmadness 於 2011-8-17 00:29 發表

SIMD data唔駛太大的memory bandwidth

predict到, prefetch到既, 根本唔需要大bandwidth
graphics要咁大bandwidth有時係因為predict唔到
The matrices to be processed in film ripping is similar to those in 3D games, often not so predictable. SIMD and AVX do help some due to data batching and perform vector or matrix operations each time instead of only one value.
But it doesn't mean bandwidth can be saved.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 ccw 於 2011-8-16 16:21 發表

Like a pair of human hands vs a bunch of robot arms
The robot arms have no use if the transporting band is so slow.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 Henry 於 2011-8-17 01:22 發表

The matrices to be processed in film ripping is similar to those in 3D games, often not so predictable. SIMD and AVX do help some due to data batching and perform vector or matrix operations each tim ...
if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important

and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth remained the same?

TOP

引用:
原帖由 Henry 於 2011-8-17 01:26 發表

The robot arms have no use if the transporting band is so slow.
and benchmarks already show that the bandwidth is more than enough and doubling the bandwidth DOES NOT HELP.

TOP

and show your data to support your claim that the extra bandwidth will help a lot in some tasks.

TOP

http://www.anandtech.com/bench/Product/100?vs=108

i7 950: 3.06GHz / 3.33GHz Turbo / 32.0 GB/s
i7 860: 2.80GHz / 3.46GHz Turbo / 21.3 GB/s

The max of ~10% gain is nowhere significant

TOP

引用:
原帖由 qcmadness 於 2011-8-17 01:46 發表

if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important

and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth ...
P4 => lost data in pipeline and need redo, that's why per cycle performance is even worse than PIII.
C2D => basically a PIII-S with P4 cache fetch. (Of course with some innovations inside)
i7 => a balance between C2D and P4, take both advantages plus HT tweak.

latency => CL in RAM/cache
bandwidth => MHz*bits in RAM/cache
2 different aspects.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 01:55 發表
http://www.anandtech.com/bench/Product/100?vs=108

i7 950: 3.06GHz / 3.33GHz Turbo / 32.0 GB/s
i7 860: 2.80GHz / 3.46GHz Turbo / 21.3 GB/s

The max of ~10% gain is nowhere significant
Increase in bandwidth needs the help of architecture in order to have improvements.
860 and 950 basically is the same chip.

If as you said, bandwidth is not so important, I think CUDA cards in datacenter or rendering farm should have 128bit/192bit memory instead in order to cut cost.
display card with large OPs is somehow different from a CPU now.
But if CPU have more and more cores and want to gain performance, a large RAM bandwidth should be fulfilled.

[ 本帖最後由 Henry 於 2011-8-17 03:01 編輯 ]
ロストックで風を攫うや思い出す

TOP