打印

[硬件] VR-Zone SB-E Price

引用:
原帖由 qcmadness 於 2011-8-16 10:01 發表

no

2600k can oc while 3820 cannot
No OC multi =/= no OC bus.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-16 11:54 發表

so little tasks benefit from extra memory channels...
Video editing.
Why CUDA can do much faster than CPU, one of the reason is RAM bandwidth.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-16 16:14 發表

of course not...

CUDA / Quicksync do faster because they are not flexible
simple maths, but many ops which is not much predictable and few can be cached in L2, so bandwidth to RAM is important. (need big fast pool)
QS not to comment, as mechanism is different.
CUDA, if low bandwidth, should also eat C.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-16 19:59 發表

cuda / qs做得好既, 都係d predictable既code... 你既concept有少少問題
我講既係Data,唔係code.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 00:29 發表

SIMD data唔駛太大的memory bandwidth

predict到, prefetch到既, 根本唔需要大bandwidth
graphics要咁大bandwidth有時係因為predict唔到
The matrices to be processed in film ripping is similar to those in 3D games, often not so predictable. SIMD and AVX do help some due to data batching and perform vector or matrix operations each time instead of only one value.
But it doesn't mean bandwidth can be saved.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 ccw 於 2011-8-16 16:21 發表

Like a pair of human hands vs a bunch of robot arms
The robot arms have no use if the transporting band is so slow.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 01:46 發表

if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important

and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth ...
P4 => lost data in pipeline and need redo, that's why per cycle performance is even worse than PIII.
C2D => basically a PIII-S with P4 cache fetch. (Of course with some innovations inside)
i7 => a balance between C2D and P4, take both advantages plus HT tweak.

latency => CL in RAM/cache
bandwidth => MHz*bits in RAM/cache
2 different aspects.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 01:55 發表
http://www.anandtech.com/bench/Product/100?vs=108

i7 950: 3.06GHz / 3.33GHz Turbo / 32.0 GB/s
i7 860: 2.80GHz / 3.46GHz Turbo / 21.3 GB/s

The max of ~10% gain is nowhere significant
Increase in bandwidth needs the help of architecture in order to have improvements.
860 and 950 basically is the same chip.

If as you said, bandwidth is not so important, I think CUDA cards in datacenter or rendering farm should have 128bit/192bit memory instead in order to cut cost.
display card with large OPs is somehow different from a CPU now.
But if CPU have more and more cores and want to gain performance, a large RAM bandwidth should be fulfilled.

[ 本帖最後由 Henry 於 2011-8-17 03:01 編輯 ]
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 03:52 發表
http://www.anandtech.com/show/45 ... ing-the-best-ddr3/8



SB nowhere requires another 20GB/s memory bandwidth when ~20GB/s is enough.
I have already read the review long time ago.
As I said, depends on arch, OPs performance, cores, also the tasks.
If CPU cannot feed the RAM bandwidth, CPU is the bottleneck.
If CPU need to wait for RAM, RAM is the bottleneck.

Now SB-E has 6-8 cores, some may have even 10-12 cores (in DP/MP), can they be satisfied with same amount of bandwidth (2 channel) as those has 4 cores?
I don't think so, though 4 channel may be an overkill in single CPU system.

Can those GF110 or GF104 cores be satisfied with only e.g. 1/5 of their enormous bandwidth (20% RAM freq, same bits) under Tesla platforms, I am really eager to know.
As those platforms claimed much more FLOPs can be juiced out than normal CPUs.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 05:01 發表


1. you still have not shown any tasks which scales well with memory bandwidth
2. SB-E for desktop only 4 and 6 oores. 8 cores is your IMAGINARY product
3. http://www.anandtech.com/bench/Product/313? ...
1. I am sorry I cannot show this moment, but does not scale well =/= does not scale. (Something similar should be the first test in your link 3)
2. 8 core is still available in LGA2011 form, and I cannot say nobody will use it in single CPU.
3. I would like to know why the result in 1st has difference if it is not bandwidth dependent, though I am convinced with the results.
ロストックで風を攫うや思い出す

TOP

引用:
原帖由 qcmadness 於 2011-8-17 11:58 發表


1. of course it scales, but is the scaling meaningful is another question
2. i don't think so, server socket for 4 qpi connections is different
3. decompression / compression is particularly bandwid ...
Video compression is also compression (under different algorithm), that's why I say it should be bandwidth sensitive.
MP side should also use LGA2011 if not got wrong, as to take over LGA1567.

[ 本帖最後由 Henry 於 2011-8-17 16:15 編輯 ]
ロストックで風を攫うや思い出す

TOP