Intel demos Knights Ferry CPU, 1 teraflop performance from a single chip

By on November 16, 2011, 1:30 PM

Intel has revealed a new processor with more than 50 cores at the SC11 supercomputing conference in Seattle. The groundbreaking chip was running in a test machine and is capable of crossing the 1 teraflop, or a trillion floating point operations per second, performance barrier.

To put that into perspective, Intel unveiled the first supercomputer to cross the 1 teraflop mark in 1997. That system was comprised of 9,298 Pentium II processors that filled 72 full-sized cabinets. In comparison, the new processor called Knights Ferry is about the size of a matchbook and can achieve the same level of performance.

Knights Ferry was designed in the Portland area and has just started manufacturing. The chip is based on Intel’s MIC, or Many Integrated Core architecture. Intel had a demo system running but reporters weren’t allowed to take photographs of it. In fact, the system wasn’t even in the same room as the presentation. An exact core count and power requirements weren’t disclosed either – all we know is that the 22nm chip features more than 50 cores and uses a PCI express slot.

Production models of Knights Ferry are still a good ways out and it’s unclear if a version of the demo unit will ever find its way to consumer systems. It’s much more likely that Intel will use Knights Ferry technology to build the next generation of supercomputers. Intel and its partners hope to deliver said systems by 2018.

Either way, Knights Ferry is extremely impressive from a performance standpoint and shows just how far processor technology has come in less than 15 years.




User Comments: 20

Got something to say? Post a comment
TomSEA TomSEA, TechSpot Chancellor, said:

Now if they can just get software to keep up with hardware innovations and improvements.

captainawesome captainawesome said:

how much faster is that than the Intel Core i7 2600 (standard clock)?

Guest said:

About minus 50 cabinets faster :))

Guest said:

2600K = 83.3 GFLOPOS

source: http://www.tomshardware.com/charts/desktop-cpu-charts-2010/R
w-Performance-SiSoftware-Sandra-2010-Pro-GFLOPS,2409.html

1 TF = 1000 GF

1000/83.3 = 12.0048... or 12X

so 12x Faster then a 2600k @ stock speed

Wagan8r Wagan8r said:

I remember about 3-4 years ago, Intel made an 80 core processor that reached the 1 teraflop barrier, and they said that it would come to market in 5 years. They've got about a yeart left, but I don't see it happening.

spydercanopus spydercanopus said:

How big a chip is this?

dividebyzero dividebyzero, trainee n00b, said:

How big a chip is this?

The die size hasn't been confirmed. The closest you'll get at the moment [link]

[Source: Tom's Hardware]

BlindObject said:

So, Crysis? At least 60fps right?

Archean Archean, TechSpot Paladin, said:

From the sources I've looked at (earlier on) it is called 'Knights Corner' not Knights ferry.

Ref: BBC: http://www.bbc.co.uk/news/technology-15758057

MarkHughes said:

I read about it as Knights Corner too, And it does say that in the screenshot....

Quite looking forward too seeing this in action, The Industry I work in could well benefit from something like this.

dividebyzero dividebyzero, trainee n00b, said:

@Archean

Knights Ferry = Prototype add-in-board derived from Larrabee- designed as co-processor- allowed developers to use the technology without needing to worry about the daughterboard arrangement

Knights Corner = Production CPU

red1776 red1776, Omnipotent Ruler of the Universe, said:

I came across this about 2 months ago, working with 8 GPU's

[link]

[link]

DanUK DanUK said:

BlindObject said:

So, Crysis? At least 60fps right?

Always one :'D

Guest said:

Chances are that many people already have a 1 TFLOPs processor in their machines called the GPU. The newest AMD HD 6990 can do 5.40 TFLOPS in single precision, and 1.37 TFLOPS in double precision. There are boards out there that will support four or MORE of these cards, meaning that a single DESKTOP can do 21.6 TFLOPS single precision and 5.48 TFLOPS in double precision. Given that these cards cost about $700 each, which means that for about $4000, you can have one mean HPC machine.

Yes, I do agree that having this type of processing on a single chip is definitely impressive, but the graphics card processing power can't be overlooked.

Ubwarcher07 said:

Anyone else notice that the article is written by a Knight <.<

red1776 red1776, Omnipotent Ruler of the Universe, said:

Chances are that many people already have a 1 TFLOPs processor in their machines called the GPU. The newest AMD HD 6990 can do 5.40 TFLOPS in single precision, and 1.37 TFLOPS in double precision. There are boards out there that will support four or MORE of these cards, meaning that a single DESKTOP can do 21.6 TFLOPS single precision and 5.48 TFLOPS in double precision. Given that these cards cost about $700 each, which means that for about $4000, you can have one mean HPC machine.

Yes, I do agree that having this type of processing on a single chip is definitely impressive, but the graphics card processing power can't be overlooked.

Not quite Guest. the 6990 ids a dual GPU card. crossfire and SLI both afford up to 4 GPU's so only two of the 6990 (crossfire), and two GTX 590 (SLI) is supported.

The maximum (before OC'ing) for Crossfire at this time is 10.8 TF. I know, this is what i am running (12.6 TF) with OC'ing

dividebyzero dividebyzero, trainee n00b, said:

Not quite Guest. the 6990 ids a dual GPU card. crossfire and SLI both afford up to 4 GPU's so only two of the 6990 (crossfire), and two GTX 590 (SLI) is supported.

The maximum (before OC'ing) for Crossfire at this time is 10.8 TF. I know, this is what i am running (12.6 TF) with OC'ing

True enough to a degree...except that no one would use SLI or CFX for HPC, and the oft-quoted TF numbers bandied around are theoretical maximums for such reeeeeeaallly useful apps such as Linpack....so your flops are only as useful as the programming running on them

Fastra II with 6 x GTX295 + 1 x GTX 275 yields 11.7 TF single precision...but for anything other than pure number crunching, something like this 8.2 TF Tesla powered Colfax CXT8000 would leave it for dead.

red1776 red1776, Omnipotent Ruler of the Universe, said:

dividebyzero said:

Not quite Guest. the 6990 ids a dual GPU card. crossfire and SLI both afford up to 4 GPU's so only two of the 6990 (crossfire), and two GTX 590 (SLI) is supported.

The maximum (before OC'ing) for Crossfire at this time is 10.8 TF. I know, this is what i am running (12.6 TF) with OC'ing

True enough to a degree...except that no one would use SLI or CFX for HPC, and the oft-quoted TF numbers bandied around are theoretical maximums for such reeeeeeaallly useful apps such as Linpack....so your flops are only as useful as the programming running on them

Fastra II with 6 x GTX295 + 1 x GTX 275 yields 11.7 TF single precision...but for anything other than pure number crunching, something like this 8.2 TF Tesla powered Colfax CXT8000 would leave it for dead.

I was speaking strictly as a matter of logistics. he said " you could have one mean desktop" and 4 is the limit. Radeons are higher rated in FLOPs than Nvidia so by the numbers, 10.6 is the max Flop rating at this time for a desktop .As far as the usefulness of FLOPS, certainly not the only metric to consider. Radeons outFLOP Nvidias offerings, in the case of the 6990 vs the GTX 590 by a 2:1 margin (5100Gflops to 2700Gflops) and performance doesn't quite reflect that discrepancy.

dividebyzero dividebyzero, trainee n00b, said:

True enough on the numbers- although "doesn't quite reflect that discrepancy" might be a bit of an understatement unless the HD 6990 suddenly became twice as productive as a GTX 590.

I read the Guest comment as:

There are boards out there that will support four or MORE of these cards...Given that these cards cost about $700 each, which means that for about $4000

Assuming that the Guest isn't mathematically challenged, I interpreted the comment to mean 4000 divided by 700 - which is closer to 6 than four - and also (maybe) co-incidentally six is the maximum number of cards used for compute in a standard ATX (7 PCI-E slot) motherboard. The seventh often being employed primarily for video out.

I would tend to regard the relative FLOP count of a CPU or GPU as more a marketing bulletpoint for the most part- if it weren't then AMD would surely be talking up the numbers.

As far as I'm aware, Bitcoin and distributed computing (BOINC, Seti@Home etc.) are about the only enviroments that fully utilise the FLOPS's of AMD's GPU's. That isn't likely to change unless/until AMD put some serious resources into OpenCL. The HD6990's supposed FLOP superiority doesn't do a hell of a lot of good with regard to F@H for example

So, to my way of thinking, the HD 6990 could be a Bugatti Veyron -great stats, but if the software enviroment equates to an eight-year-old kid that can't reach to depress the gas pedal more than a quarter of the way to the floor....

I think that there are reasons that desktop HPC doesn't generally offer a great choice in AMD GPU's and those reasons likely stem from CUDA has been well supported in the HPC/WS enviroment- the fact that CUDA ports easily to both OpenCL and Linux, Nvidia's adoption and use of ECC, and their QA program.

I'd also wonder if AMD thought that raw floating-point calculation is the answer, why are they moving to a compute based GCN arch ? especially as VLIW4 is less than a year in the marketplace.

If I had to guess I'd probably point to Nvidia's HPC/WS marketshare. There's probably some degree of status appearing in desktop HPC systems like the Cray CX1, Colfax, Dell ( [link] ), Amax, SuperMicro,HP etc. ( Lenovo don't even offer an AMD card with their WS's - ironic no?).

Load all comments...

Add New Comment

TechSpot Members
Login or sign up for free,
it takes about 30 seconds.
You may also...
Get complete access to the TechSpot community. Join thousands of technology enthusiasts that contribute and share knowledge in our forum. Get a private inbox, upload your own photo gallery and more.