Board logo

標題: [業界消息] AMD Announces Its First ARM Based Server SoC [打印本頁]

作者: qcmadness    時間: 2014-1-30 16:22     標題: AMD Announces Its First ARM Based Server SoC

http://www.anandtech.com/show/77 ... 8core-opteron-a1100

[attach]37558[/attach]

引用:
The real question is what architecture(s) AMD plans to use to get to a leadership position among ARM CPUs and a substantial share of the x86 CPU market. We get the first hint with the third bullet above: "smaller more efficient x86 CPUs will be dominant in the x86 segment".
Jaguar will come stronger?
作者: Puff    時間: 2014-1-30 17:23

Ambiguous statement as usual. Not necessarily Cats I would say, but Cat's design methodology (automation driven) + higher per-core performance yet in a smaller size and nice perf/watt. If you look at what x86 are positioned for after ARM is added into the mix, they would be APUs with low core count and scalable MP CPUs.

[ 本帖最後由 Puff 於 2014-1-30 17:26 編輯 ]
作者: qcmadness    時間: 2014-1-30 17:25

引用:
原帖由 Puff 於 2014-1-30 17:23 發表
Ambiguous statement as usual. Not necessarily Cats I would say, but Cat's design methodology (automation driven) + higher per-core performance yet in a smaller size and nice perf/watt. If you look at  ...
Intel is doing the same
作者: Puff    時間: 2014-1-30 17:31

引用:
原帖由 qcmadness 於 2014-1-30 17:25 發表

Intel is doing the same
thus the major diffs are the choice of heterogeneous solution and what are being used to address the bottom scaling over the spectrum. AMD may have a mid-term competitive advantage with ARM, as ARM is currently dominant in the low-power handheld space, but speaking of long run Intel has a lot of capital to burn.


作者: qcmadness    時間: 2014-1-30 17:32

引用:
原帖由 Puff 於 2014-1-30 17:31 發表

thus the major diffs are the choice of heterogeneous solution and what are being used to address the bottom scaling over the spectrum. AMD may have a mid-term competitive advantage with ARM, as ARM i ...
But Intel's manufacturing edge is less now
作者: Puff    時間: 2014-1-30 17:36

引用:
原帖由 qcmadness 於 2014-1-30 17:32 發表

But Intel's manufacturing edge is less now
Whatever as it depends on whether Intel can break into that market with x86, or drastically turn the big ship towards ARM.



But the main idea is that AMD would likely converge their x86 lines, as they no longer need two cores to scale x86 spectacularly*. The converged core, if any, should be positioned like K10+++ or Haswell.


*: In the way they projected in 2006/07. Cats was intended to scale down beyond 1 watt IIRC. But now it is the job of ARM solutions. (late edit)

[ 本帖最後由 Puff 於 2014-1-30 17:49 編輯 ]
作者: qcmadness    時間: 2014-1-30 17:50

引用:
原帖由 Puff 於 2014-1-30 17:36 發表

Whatever as it depends on whether Intel can break into that market with x86, or drastically turn the big ship towards ARM.



But the main idea is that AMD would likely converge their x86 lines ...
No...

It seems AMD will concentrate with ARM (<0.5-5W) and x86 (5W+) for each core
作者: Puff    時間: 2014-1-30 17:56

引用:
原帖由 qcmadness 於 2014-1-30 17:50 發表

No...

It seems AMD will concentrate with ARM (
I don't see contradiction. Basically I meant what you mean here.


Perhaps also with a lower frequency ceiling wrt custom designs like Haswell or BD.

[ 本帖最後由 Puff 於 2014-1-30 17:58 編輯 ]
作者: qcmadness    時間: 2014-1-30 17:58

引用:
原帖由 Puff 於 2014-1-30 17:56 發表

I don't see contradiction. Basically I meant what you mean here.


Perhaps also with a lower frequency ceiling with regard to custom designs like Haswell or BD.
5W/core in x86 space is very low indeed.
作者: Puff    時間: 2014-1-30 18:01

引用:
原帖由 qcmadness 於 2014-1-30 17:58 發表

5W/core in x86 space is very low indeed.
If you mean SOC TDP divided by core count, then yes. 5 watts of absolute power per core in a notebook-class SOC is significant, anyway.
作者: Puff    時間: 2014-1-30 18:13

The argument is that it is useless to build two cores, aiming for a similar power performance and scaling, but with different ISAs. It is clear that AMD targets just dense server, embedded and handheld with ARMv8 based on the current bits of information... So it completely overlaps with Cat's initiatives (recall your memories in 07-10). As currently it is just a licensee of ARM and probably it treats the IP acquisition cost as the cost of early market entrance (so that it can release their first wave of ARMv8 product way before the current two core roadmaps, aka BD/CAT, end & the custom core, if any, is ready), it is not a huge prob at all.

However, when it comes to the stage of building a custom core, it becomes a prob. AMD has limited resources, isn't it? Going three microarchitectures, of which two overlaps with each other, is unlikely to happen. You may argue that they wouldn't build a custom ARM core, though.

[ 本帖最後由 Puff 於 2014-1-30 18:33 編輯 ]
作者: qcmadness    時間: 2014-1-30 20:06

引用:
原帖由 Puff 於 2014-1-30 18:13 發表
The argument is that it is useless to build two cores, aiming for a similar power performance and scaling, but with different ISAs. It is clear that AMD targets just dense server, embedded and handhel ...
AMD may not go with custon ARM designs
作者: Puff    時間: 2014-1-30 20:59

引用:
原帖由 qcmadness 於 2014-1-30 20:06 發表

AMD may not go with custon ARM designs
I got one clear messages from AMDer and three signs from LinkedIn that may point to a custom ARM microarchitecture in the pipeline. Another clear message is Bulldozer's irreversible EOL.



[ 本帖最後由 Puff 於 2014-1-30 21:02 編輯 ]
作者: qcmadness    時間: 2014-1-30 21:05

引用:
原帖由 Puff 於 2014-1-30 20:59 發表

I got one clear messages from AMDer and three signs from LinkedIn that may point to a custom ARM microarchitecture in the pipeline. Another clear message is Bulldozer's irreversible EOL.

What kind of "custom" is taking place? CPU-CPU interconnect can be.

Bulldozer's EOL is known already.
作者: Puff    時間: 2014-1-30 21:10

引用:
原帖由 qcmadness 於 2014-1-30 21:05 發表

What kind of "custom" is taking place? CPU-CPU interconnect can be.
"High-level definition of core microarchitecture". There is another "ambidextrous" interconnect project supporting both x86/ARM SOC/chips. Probably ring based. probably.
引用:
Bulldozer's EOL is known already.
Which means either AMD gives up completely the top-half space of PC and server, AMD has yet another high-performance core to succeed it and leaves Cats in its place, or AMD converges to a single core beyond BD & Cat's five-year lifespan.

[ 本帖最後由 Puff 於 2014-1-30 21:16 編輯 ]
作者: qcmadness    時間: 2014-1-30 21:15

引用:
原帖由 Puff 於 2014-1-30 21:10 發表

"High-level definition of core microarchitecture". There is another "ambidextrous" interconnect project supporting both x86/ARM SOC/chips. Probably ring based. probably.



Which means either ...
Beefing up Jaguar is an option.

It is already on par with K8 / Pilediver IPC wise.
作者: Puff    時間: 2014-1-30 21:18

引用:
原帖由 qcmadness 於 2014-1-30 21:15 發表

Beefing up Jaguar is an option. It is already on par with K8 / Pilediver IPC wise.
Literally means the same as convergence of two cores, I guess.

Anyhow, AMD already demonstrated their commitment to drive high-performance core towards Cat's automated design methodology in HC24.

作者: qcmadness    時間: 2014-1-30 21:19

引用:
原帖由 Puff 於 2014-1-30 21:18 發表

Literally means the same as convergence of two cores.

Anyhow, AMD already demonstrated their commitment to drive high-performance core towards Cat's automated design methodology in HC24.
Not convergence

Improve from Jaguar using K10.5 and Bulldozer experiences
作者: Puff    時間: 2014-1-30 21:23

引用:
原帖由 qcmadness 於 2014-1-30 21:19 發表

Not convergence

Improve from Jaguar using K10.5 and Bulldozer experiences
You know what I mean. A far-stronger Jaguar capable of 3-3.5GHz clock would be nice. Give it more execution resources and larger windows, stick it with a ring interconnect and overhauled cache hierarchy, and you will get a OHHH-FINALLY-COMPETITIVE Opteron MP chip. Private L2, please!


作者: qcmadness    時間: 2014-1-30 21:30

引用:
原帖由 Puff 於 2014-1-30 21:23 發表

You know what I mean. A far-stronger Jaguar capable of 3-3.5GHz clock would be nice. Give it more execution resources and larger windows, stick it with a ring interconnect and overhauled cache hierar ...
For example:

1. 3 AGUs with L/S capability
2. One more ALU
3. 6-8 3GHz+ cores


Jaguar:


AMD Hammer (K8):

作者: Puff    時間: 2014-1-30 21:40

Strong L/S system is preferred over more ALUs.
Say Load Queue with 64+ entries and Store Queue with 32+ entries.
Super fast L2 would be great, particularly <15 clk 1MB L2

My dream core. Imagine a 4-way decode & 32B front-end. Perhaps 2-way SMT?
[attach]37570[/attach]




P.S. AMD may adopt coarse-grained directory coherence (fallback to snoop via a null directory)

[ 本帖最後由 Puff 於 2014-1-30 21:54 編輯 ]
作者: qcmadness    時間: 2014-1-30 21:42

引用:
原帖由 Puff 於 2014-1-30 21:40 發表
Strong L/S system is preferred over more ALUs.
Say Load Queue with 64+ entries and Store Queue with 32+ entries.
Super fast L2 would be great, particularly  
Too big for each core
作者: Puff    時間: 2014-1-30 21:43

引用:
原帖由 qcmadness 於 2014-1-30 21:42 發表

Too big for each core
I guess it would be fine for automated designs... ALU won't occupy too much space, but the L/S unit will. Server workloads rely on the perf of the later tho.


P.S. Broadcom Vulcan Core

[ 本帖最後由 Puff 於 2014-1-30 21:44 編輯 ]
作者: qcmadness    時間: 2014-1-30 21:45

引用:
原帖由 Puff 於 2014-1-30 21:43 發表

I guess it would be fine for automated designs... ALU won't occupy too much space, but the L/S unit will. Server workloads rely on the perf of the later tho.


P.S. Broadcom Vulcan Core
x86 is far more transistor-hungry than ARM
作者: Puff    時間: 2014-1-30 21:50

引用:
原帖由 qcmadness 於 2014-1-30 21:45 發表

x86 is far more transistor-hungry than ARM
well I doubt it would be really a lot when you look at Intel's implementation. It just burns more transistors on decoding/microcode and a sophisticated load-store unit due to x86's strict memory ordering model.



[ 本帖最後由 Puff 於 2014-1-30 21:52 編輯 ]
作者: qcmadness    時間: 2014-1-30 22:50

引用:
原帖由 Puff 於 2014-1-30 21:50 發表

well I doubt it would be really a lot when you look at Intel's implementation. It just burns more transistors on decoding/microcode and a sophisticated load-store unit due to x86's strict memory orde ...
Remember Intel has the highest-density cache in the industry.
And Intel has control over the fabrication / manufacturing.
作者: Puff    時間: 2014-1-30 23:11

引用:
原帖由 qcmadness 於 2014-1-30 22:50 發表

Remember Intel has the highest-density cache in the industry.
And Intel has control over the fabrication / manufacturing.
no matter how it goes, overprovision is always needed for diminishing IPC improvements. The 3.1mm2 Jaguar has a plenty of room to grow IMO, particularly when we are talking about perhaps FinFET based designs beyond Excavator, if one takes it as the base design to work on.

[ 本帖最後由 Puff 於 2014-1-30 23:26 編輯 ]
作者: qcmadness    時間: 2014-1-30 23:30

引用:
原帖由 Puff 於 2014-1-30 23:11 發表

no matter how it goes, overprovision is always needed for diminishing IPC improvements. The 3.1mm2 Jaguar has a plenty of room to grow IMO, particularly when we are talking about perhaps FinFET based ...
Even at 10mm^2, it is still small compared with Steamroller and Haswell
作者: Puff    時間: 2014-1-30 23:44

引用:
原帖由 qcmadness 於 2014-1-30 23:30 發表

Even at 10mm^2, it is still small compared with Steamroller and Haswell
plenty of options to fill that up
- less dense for higher frequency (single turbo up to 3+ Ghz would be nice)
- 3 ALU + 3 AGU as you suggested
- Pipelined Multiplier really helps... also better divisor
- 2 LD + 1 ST port for DC
- larger load-store unit... (Jaguar: 12-entry unified queue + 20-entry store queue)
- 4-way decode, dispatch & retire
- post-decode COP queue...? uop cache?
- more scheduler entries (Jaguar: 20/12/18) & larger instruction window (Jaguar: 64/44)
- more register file entries
- 256b VFP datapath...?
- 2-way SMT?
- Private L2 cache



[ 本帖最後由 Puff 於 2014-1-30 23:47 編輯 ]
作者: qcmadness    時間: 2014-1-30 23:46

引用:
原帖由 Puff 於 2014-1-30 23:44 發表

plenty of options to fill that up
- less dense for higher frequency (single turbo up to 3+ Ghz would be nice)
- 3 ALU + 3 AGU as you suggested
- Pipelined Multiplier really helps... also better divis ...
SMT is not that useful in client processors.
And I would expect AMD will re-focus in client processors rather than server processsors.
作者: Puff    時間: 2014-1-30 23:56

引用:
原帖由 qcmadness 於 2014-1-30 23:46 發表

SMT is not that useful in client processors.
And I would expect AMD will re-focus in client processors rather than server processsors.
I would say AMD always prioritizes servers over clients... But unfortunately their revenue is in an inverse way thanks to the epic Bulldozer. It is fairly easy to count what AMD would still stick to x86:

1. Scalable MP CPU for enterprise database, analytics, HPC and workstations
2. PC processors in desktop, notebook and ultra compact forms
3. Embedded systems requiring heavy visualization
4. Strong APUs for workstations, enterprise & scientific-computing in the future, requires HBM
5. Dense server for computing and media processing purpose

Seemingly they can all be addressed by a single core by scaling core counts, frequencies, voltages and uncore (L3$?).


[ 本帖最後由 Puff 於 2014-1-31 00:01 編輯 ]
作者: qcmadness    時間: 2014-1-31 00:01

引用:
原帖由 Puff 於 2014-1-30 23:56 發表

I would say AMD always prioritizes servers over clients... But unfortunately their revenue is in a inverse way thanks to the epic Bulldozer. It is fairly easy to count what AMD would still stick to x ...
In the past: true

Except the success of Bobcat and Jaguar, both of which are low-power client architectures.
作者: Puff    時間: 2014-1-31 00:19

引用:
原帖由 qcmadness 於 2014-1-31 00:01 發表

In the past: true

Except the success of Bobcat and Jaguar, both of which are low-power client architectures.
I would say they are more likely cost optimized instead of just "low-power client", and the chip designs were also at a cost optimized position in the first place. They are nice examples of carrying bunch of server features while still performing well. Putting efforts on server doesn't always make clients perform bad either.

Strategically they would still have servers prioritized over generic PC clients, I guess, as one would love a high margin business better. A single core may target both nicely, if they can pull the cost and TTM down from the highly-custom Bulldozer level to an optimal level that maybe achieved by a higher degree of automation.

[ 本帖最後由 Puff 於 2014-1-31 00:29 編輯 ]
作者: Puff    時間: 2014-1-31 16:03

http://www.ptt.cc/bbs/PC_Shopping/M.1391141082.A.4CB.html
http://www.ptt.cc/bbs/PC_Shopping/M.1351436294.A.845.html
http://images.anandtech.com/doci/6418/Screen%20Shot%202012-10-29%20at%204.55.05%20PM.png
http://gigaom2.files.wordpress.com/2013/06/amdroadmap.jpg

If AMD does want to play high-perf ARM (Vulcan- or Xeon-class)... it means no presence in x86-dorminant workstation market (AMD is nearly non-exist in this market already, btw, so unless it joints APPL... eh) & the <50% windows server market, so as the dominant commercial toolchain & codebase of x86. Thus, I still hold my own expectation based on all AMD projections published, which is that it would stick to A57- to Cyclone-class positioning and scale it from handheld to microservers with rich I/O & customization options (i.e. their SCBU), leaving high-perf and trad. market to x86. It may also attack comms infrastructure (BCOM/LSI?) in the future if it'd like to. All these new markets (to AMD) are open source friendly/ARM dormiant somehow, thus using ARM is not too problematic at all.


note: leopard -> zen
note2: hope that dual-core phones and tabs will make a return


[ 本帖最後由 Puff 於 2014-1-31 17:05 編輯 ]
作者: Puff    時間: 2014-2-5 04:37

http://venturebeat.com/2014/02/0 ... interview/view-all/



顯然呢篇 interview 只會係佔大路 ge "AMD to use ARM to f**k its own x86 in and out" 上燒多把火,我雙拳難敵四手。我只覺得係佢最少呢兩年內唔會有 x86 靚產品、同 ARM server 仲係半桶水嘅前提下,當然唔會倒自己米。但佢又無講錯,x86 摺唔摺全睇 Intel。

又或者 AMD 真係想整粒 ARM 版 E3 做 dense server... 又或者同 BCOM/LSI/Cavium 搶下 comms processor 油水。不過 mainframe, GP server 同 HPC 未來幾年依然係 Intel 佔大頭,dense server 形勢未明。老實講就算假設 AMD 要重新掠 mid to high-end server 水我會信佢地用 x86 多過 ARMv8,而且講咁多... 查實 ARMv8 唯一優勢就係多廠選擇同相對低成本。但 x86 又唔係完全唔可以 customize,主機果兩粒已經係例子...。真係搞,除非已經有 partner/客埋堆 (e.g. 生果全線麥金塔),如果唔係機會無限趨 0

無乜野要補充啦。



Late Edit: 睇黎 AMD 真心想用 ARM 入 comms processor 市場...

[ 本帖最後由 Puff 於 2014-2-5 16:46 編輯 ]




歡迎光臨 HKSpot (https://bbs.hk-spot.com/) Powered by Discuz! 6.0 Lite