引用:
原帖由 qcmadness 於 2011-8-17 01:46 發表
if you know the latency of a SIMD / AVX instruction, you will know why i say that the bandwidth is not important
and do you remember why p4 > c2d has doubled the SSEx performance while the bandwidth ...
P4 => lost data in pipeline and need redo, that's why per cycle performance is even worse than PIII.
C2D => basically a PIII-S with P4 cache fetch. (Of course with some innovations inside)
i7 => a balance between C2D and P4, take both advantages plus HT tweak.
latency => CL in RAM/cache
bandwidth => MHz*bits in RAM/cache
2 different aspects.