It could possibly turn the CPU market on its head!
CPU's require OoO primarily because they are limited in tthe number of simultaneous computational threads they can deploy and the memory resources available.I am sorry, but if I say this is just so generalist sort of a statement without grasping the underlying facts, I wouldn't be too far from reality....
Just consider these facts:
- They will surely need to deliver multicore chips which must have good out-of-order execution engine, also I absolutely have no idea about e.g. branch prediction efficiency of ARM's In-Order (which limits performance in itself) architecture; without these they simply have no chance of competing with AMD/Intel's desktop processors.
Now bear in mind that the Project Denver outline is to marry (using GF110 as example) 512 stream processors (essentially a very basic CPU core with linear functionality)-CUDA cores in nvidiaspeak- with it's inherently faster (wider bus/greater bandwidth/lower latency) memory resouces, to what will in all likelihood be a 64-bit ARM based architecture- presumeably-as is presently the case- to allocate resources where program code branches becomes more divergent than would be suited by the CUDA cores SIMD nature.
Not overly unusual to hear a PR guy bigging up their company I would have thought. Dirk Meyer and Jen Hsun Huang are both adept at equally bombastic rhetoric when someone shoves a recording device in front of their faces. CES is, after all a PR driven event and Intel have product to push."You want to come and party in our kitchen and rattle the pots and pans? I've got Sandy Bridge. Bring it on," Intel spokesman Dave Salvator said at the Consumer Electronics Show in Las Vegas.
ET3D said:
Personally I've been waiting for the AMD Fusion CPU's (or APU's, as AMD likes to call them) for a while, and I'm still waiting for real arrival and reviews. Hopefully soon.
.
CPU's require OoO primarily because they are limited in tthe number of simultaneous computational threads they can deploy and the memory resources available.
Now bear in mind that the Project Denver outline is to marry (using GF110 as example) 512 stream processors (essentially a very basic CPU core with linear functionality)-CUDA cores in nvidiaspeak- with it's inherently faster (wider bus/greater bandwidth/lower latency) memory resouces, to what will in all likelihood be a 64-bit ARM based architecture- presumeably-as is presently the case- to allocate resources where program code branches becomes more divergent than would be suited by the CUDA cores SIMD nature.
Yes the limitations are there, but OOO/OOE are there to help overcome these:
out-of-order execution (OoOE or OOE) is a paradigm used...[/]
True...to a degree. Firstly I think the same argument was used with the 64 bit extension for x86. It's been a long haul but I think people are starting to see the light.Here in lies a major bone of contention, you will need to re-program/recompile/optimize everything for the new architecture; and I don’t see such a huge x86 base moving everything to this platform any time soon.
Given that nvidia is in the graphics business I'd say that you're posing a loaded question......but I'll give it a shot anyway (bearing in mind that software USUALLY follows hardware)What is the incentive outside the graphic intensive applications to do this? Absolutely none I’d say.
Just as well the tech world isn't totally reliant on business intransigence. Although I would hazard a guess that small/medium business is not the main aim of either the software or hardware- hardly surprising since businesses generally upgrade only when they need/have to and typically businesses are the last to migrate to a new OS. If business were the prime motivator we wouldn't have 64-bit computing, Core2Duo/Quad, discrete graphics, SATA interfaces, DisplayPort, DVI USB 3.0, Sandy Bridge, Phenom II, Eyefinity/nvidia Surround, 1080p + monitors, tablet/notebook PC's, smartphones, motherboard tweaking options etc, etc...and we would all be using single core CPU's hooked up to 865PE chipset boards viewing the efforts of our labour on CRT 1024x768 monitors. Thrilling stuff.Business world doesn’t move everything to new architectures or OS on the whims of people like us who jump on everything new and exciting. Just to add little support to this argument, we have dozens of computers still running Windows XP on P4s/Pentium Ds at work, and they are still doing what is needed pretty efficiently; hell our dispatch clerk is still using the good old 4L printer from the ancient times.![]()
This would be the same Sandy Bridge that uses Nvidia graphics IP then? ( a major contibuting factor in Intel settling with Nvidia for $1.5bn. Sandy Bridge, Ivy Bridge and Haswell will all use the Nvidia IP.Also consider this fact, Intel was able to improve graphic performance upward of 2x with Sandy Bridge, have nVidia or for that matter AMD ever able to have done that with any of their newer generation graphic cards?
That's a big "if". At present Intel is at 32nm. GPU's are at 40nm. Maybe we could revisit this thread when GPU's are on 28nm later in the year.Considering the implications of Moore’s Law, and significant resources and technological insight Intel and AMD are in far better position to outrun nVidia here as long as they are able to double CPU/graphic performance of on-die solutions.?
Apples and oranges. CPU's are dependant upon cache latency- GPU's aren't for the simple reason that the GPU has a MUCH higher degree of simultaneous multi-threaded performance. The reason that GPU's don't have an L3 cache.Additionally I haven’t had time to read all the architectural improvements in Sandy Bridge, but what early little bits I read, I remember I roughly calculated that L3 cache latency was down by 38% (either due to improved pre-fetch unit / improved L3 Latency). I’d love to know improvements in GPUs at similar pace, I doubt it will happen on such consistent basis though.
You realise that to increase the on-die graphics performance on Ivy Bridge, Llano derivatives etc. requires adding more shader pipelines? Adding shaders and/or shader frequency requires more power. Again, quite how this has morphed into a graphics discussion is beyond me. iirc GF108, 106 and 104 suffer no real drawbacks in either thermal or power usage characteristics if you take relative performance/watt into account. or are we comparing Fermi with HD2000/3000 ? Seems like comparing a Lambo LP640 against a Toyota Prius using fuel economy as the only metric.And I haven’t yet touched the issue of power utilization of such higher end graphic cores, where nVidia will need to radically redesign their offering to reduce thermal output to be competitive at the least..
True...to a degree. Firstly I think the same argument was used with the 64 bit extension for x86. It's been a long haul but I think people are starting to see the light.
This would be the same Sandy Bridge that uses Nvidia graphics IP then? ( a major contibuting factor in Intel settling with Nvidia for $1.5bn. Sandy Bridge, Ivy Bridge and Haswell will all use the Nvidia IP.
If you remove gaming from the equation then all you need (and are left with) is hardware decode, multi-monitor support and HD playback, which is what the HD3000 already provides. In this respect it doesn't really matter if the GPU is on-die or soldered to the board. On-die integration with the CPU is always going to lag behind a standalone solution due to size constraints of adding larger shader blocks (and a dedicated tessellator if DX11 is required).IGP Performance: I agree unless IGPs are able to reach performance level of upward of 70$ graphic cards or ideally around 100$ (at the given time) they won’t be much good for gaming but for most of the other things they are more than enough.
Stacked memory. Intel have been toying with it for around four years I think. Recently Charlie D repackaged the story with the emphasis on Ivy Bridge. The principle drawbacks are increased die size, a big power draw/thermal envelope penalty and low speed suitable for a high end mobile or desktop only part. As CD mentions, bandwidth wont be a problem on a 512-bit bus, even if the GPU architecture/graphics drivers themselves are less than idealI somewhere read (a rumor I'd say) that Intel is planning to put 1GB on-die dedicated memory ......... have you read or heard any such thing?.
GT200, GF100 and GF110 to an extent are more a product of Jen Hsun's adherence to a large monolithic die stategy and Nvidia's vision battling the laws of Physics. AMD's GPU's are made by the same foundry (TSMC) on the same process (40nm) as Nvidia's big chips. The difference lies in feature set and what markets the resultant GPU's sell in. AMD are almost exclusively gaming/HTPC orientated. Nvidia adds compute (GPGPU) into the mix- 72-bit ECC memory, wider memory bus widths, high double precision requirement (largely disabled in gaming cards) etc.Considering the 40nm issues (also I remember nVidia struggled with GT200 GPUs 65/55nm path) and their reliance on outside foundries I wouldn’t rule out such problems in the future hampering their progress (read reliability as supplier)
I don't remember arguing against Intel's ability to execute. What I do see is the processor moving towards a greater degree of parallelization as an alternative to increased core speed. Out of Order Execution is parallelization by another name, and the more threads/cores the processor has at it's disposal the greater the data throughput- hence Intels use of AVX now and their longer range plans for multi-core/multi-pipeline processors in future.Intel’s full node + custom design concept worked perfectly so far + manufacturing capabilities gives them some advantage here as well, hence, when they move to 22nm later this year they will be ahead in this area.
The reason I think Larrabee is still alive is that the idea behind the hardware is sound. What Intel lacks is GPGPU experience in hardware and drivers. There's a reason that there are only two main player is the discrete GPU market, and only one player in workstation/compute driver arena.I suspect that is a reason why Intel never really killed Larrabee, they want to come up with something which can successfully compete, and save them all these licensing issues/cost with nVidia..
See above re Intel and Nvidia. Nvidia are by accounts are looking at a 2013 timeframe using the Maxwell GPU (20nm process, ~14-16 GFlops of double precision per watt target performance) as the GPU arch in question- which both looks significantly faster than the Intel solution and light years ahead in drivers and compiler -remember that Nvidia supports both CUDA and OpenCL.However, the catch is, if (again I’d say HUGE if) Intel (or AMD) can come up x86 based high performance GPGPU in the given time window, this whole ARM+nVidia debate may turn out to be storm in a teacup...
If you remove gaming from the equation then all you need (and are left with) is hardware decode, multi-monitor support and HD playback, which is what the HD3000 already provides.
I do see is the processor moving towards a greater degree of parallelization as an alternative to increased core speed