打印

[硬件] VR-Zone SB-E Price

ccw

not found

吹水王

Rank: 5 Rank: 5 Rank: 5 Rank: 5 Rank: 5

not found

PM
加為好友
當前離線

16^# 大中小發表於 2011-8-16 16:07 只看該作者

引用:

原帖由 Henry 於 2011-8-16 15:43 發表

No OC multi =/= no OC bus.

But it is quite certain that without OC multi, not much gain can be achieved.

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

17^# 大中小發表於 2011-8-16 16:14 只看該作者

引用:

原帖由 Henry 於 2011-8-16 15:45 發表

Video editing.
Why CUDA can do much faster than CPU, one of the reason is RAM bandwidth.

of course not...

CUDA / Quicksync do faster because they are not flexible

http://bbs.hk-spot.com

TOP

ccw

not found

吹水王

Rank: 5 Rank: 5 Rank: 5 Rank: 5 Rank: 5

not found

PM
加為好友
當前離線

18^# 大中小發表於 2011-8-16 16:21 只看該作者

引用:

原帖由 qcmadness 於 2011-8-16 16:14 發表

of course not...

CUDA / Quicksync do faster because they are not flexible

Like a pair of human hands vs a bunch of robot arms

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

19^# 大中小發表於 2011-8-16 17:13 只看該作者

引用:

原帖由 qcmadness 於 2011-8-16 16:14 發表

of course not...

CUDA / Quicksync do faster because they are not flexible

simple maths, but many ops which is not much predictable and few can be cached in L2, so bandwidth to RAM is important. (need big fast pool)
QS not to comment, as mechanism is different.
CUDA, if low bandwidth, should also eat C.

ロストックで風を攫うや思い出す

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

20^# 大中小發表於 2011-8-16 19:59 只看該作者

引用:

原帖由 Henry 於 2011-8-16 17:13 發表

simple maths, but many ops which is not much predictable and few can be cached in L2, so bandwidth to RAM is important. (need big fast pool)
QS not to comment, as mechanism is different.
CUDA, if low ...

cuda / qs做得好既, 都係d predictable既code... 你既concept有少少問題

http://bbs.hk-spot.com

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

21^# 大中小發表於 2011-8-16 23:20 只看該作者

引用:

原帖由 qcmadness 於 2011-8-16 19:59 發表

cuda / qs做得好既, 都係d predictable既code... 你既concept有少少問題

我講既係Data，唔係code.

ロストックで風を攫うや思い出す

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

22^# 大中小發表於 2011-8-17 00:29 只看該作者

引用:

原帖由 Henry 於 2011-8-16 23:20 發表

我講既係Data，唔係code.

SIMD data唔駛太大的memory bandwidth

predict到, prefetch到既, 根本唔需要大bandwidth
graphics要咁大bandwidth有時係因為predict唔到

http://bbs.hk-spot.com

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

23^# 大中小發表於 2011-8-17 01:22 只看該作者

引用:

原帖由 qcmadness 於 2011-8-17 00:29 發表

SIMD data唔駛太大的memory bandwidth

predict到, prefetch到既, 根本唔需要大bandwidth
graphics要咁大bandwidth有時係因為predict唔到

The matrices to be processed in film ripping is similar to those in 3D games, often not so predictable. SIMD and AVX do help some due to data batching and perform vector or matrix operations each time instead of only one value.
But it doesn't mean bandwidth can be saved.

ロストックで風を攫うや思い出す

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

24^# 大中小發表於 2011-8-17 01:26 只看該作者

引用:

原帖由 ccw 於 2011-8-16 16:21 發表

Like a pair of human hands vs a bunch of robot arms

The robot arms have no use if the transporting band is so slow.

ロストックで風を攫うや思い出す

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

25^# 大中小發表於 2011-8-17 01:46 只看該作者

引用:

原帖由 Henry 於 2011-8-17 01:22 發表

The matrices to be processed in film ripping is similar to those in 3D games, often not so predictable. SIMD and AVX do help some due to data batching and perform vector or matrix operations each tim ...

if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important

and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth remained the same?

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

26^# 大中小發表於 2011-8-17 01:47 只看該作者

引用:

原帖由 Henry 於 2011-8-17 01:26 發表

The robot arms have no use if the transporting band is so slow.

and benchmarks already show that the bandwidth is more than enough and doubling the bandwidth DOES NOT HELP.

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

27^# 大中小發表於 2011-8-17 01:49 只看該作者

and show your data to support your claim that the extra bandwidth will help a lot in some tasks.

http://bbs.hk-spot.com

TOP

qcmadness

管理員

Rank: 10

吹水部屋

PM
加為好友
當前離線

28^# 大中小發表於 2011-8-17 01:55 只看該作者

http://www.anandtech.com/bench/Product/100?vs=108

i7 950: 3.06GHz / 3.33GHz Turbo / 32.0 GB/s
i7 860: 2.80GHz / 3.46GHz Turbo / 21.3 GB/s

The max of ~10% gain is nowhere significant

http://bbs.hk-spot.com

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

29^# 大中小發表於 2011-8-17 02:53 只看該作者

引用:

原帖由 qcmadness 於 2011-8-17 01:46 發表

if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important

and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth ...

P4 => lost data in pipeline and need redo, that's why per cycle performance is even worse than PIII.
C2D => basically a PIII-S with P4 cache fetch. (Of course with some innovations inside)
i7 => a balance between C2D and P4, take both advantages plus HT tweak.

latency => CL in RAM/cache
bandwidth => MHz*bits in RAM/cache
2 different aspects.

ロストックで風を攫うや思い出す

TOP

Henry

吹水部屋OC Team

Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7 Rank: 7

亨利

PM
加為好友
當前離線

30^# 大中小發表於 2011-8-17 02:59 只看該作者

引用:

原帖由 qcmadness 於 2011-8-17 01:55 發表
 http://www.anandtech.com/bench/Product/100?vs=108

i7 950: 3.06GHz / 3.33GHz Turbo / 32.0 GB/s
i7 860: 2.80GHz / 3.46GHz Turbo / 21.3 GB/s

The max of ~10% gain is nowhere significant

Increase in bandwidth needs the help of architecture in order to have improvements.
860 and 950 basically is the same chip.

If as you said, bandwidth is not so important, I think CUDA cards in datacenter or rendering farm should have 128bit/192bit memory instead in order to cut cost.

display card with large OPs is somehow different from a CPU now.
But if CPU have more and more cores and want to gain performance, a large RAM bandwidth should be fulfilled.

[ 本帖最後由 Henry 於 2011-8-17 03:01 編輯 ]

ロストックで風を攫うや思い出す

TOP