引用:
No matter how rough the current implementation of Bulldozer is, if you look a bit deeper, this is not the architecture that is made for high-IPC, branch intensive, lightly-threaded applications.
If... if Bulldozer and his followings are indeed modular, perhaps AMD will deliver two variants of cores? One for APU with smaller, more responsive caches
(perhaps a 8-way 256KB cache..., and please get all second integer cores parked), and one for server with bulk caches. For the IPC issue, Steamroller? seems to have extended the capabilities of AGLUs to execute GPR instructions. One example is register-to-register moves. Let's see if this will help.
Anyway, Bulldozer module with 1MB L2 cache reduces the L2 load-use latency by 1 cycle only. This information is from... errrmm... IEEE 2011 Bulldozer paper.
[
本帖最後由 Puff 於 2012-5-30 21:17 編輯 ]