Nvidia unveils next-gen GPU architecture, Fermi

By on September 30, 2009, 5:01 PM
Nvidia has taken the wraps off its next generation CUDA GPU architecture, Fermi. Engineered from the ground-up, the company says Fermi delivers breakthroughs in both graphics and GPU computing. Parallel Computing Research Laboratory director Dave Patterson believes Nvidia has taken a leap toward making GPUs attractive for a broader class of programs, and thinks the company's latest technology will stand as a significant milestone.

Showcased at Nvidia's GPU conference in California today, Oak Ridge National Laboratory already plans to use Fermi in a new supercomputer, and the architecture garnered the support of various organizations, including Bloomberg, Cray, Dell, HP, IBM and Microsoft. Nvidia's next-gen chip features a slew of new "must-have" technologies like ECC, 512 CUDA Cores with the new IEEE 754-2008 floating-point standard, and 8x the peak double precision arithmetic performance over Nvidia's last generation GPU.


Other features include Nvidia Parallel DataCache, Nvidia GigaThread Engine, and Nexus. Technical whitepapers and more information can be found on Fermi's page. The Tech Report and Real World Technologies each have detailed write-ups on Fermi, which include some speculative performance comparisons on AMD's next-generation GPU.




User Comments: 7

Got something to say? Post a comment
Burty117 Burty117, TechSpot Chancellor, said:

I commented on the next gen ATI card reviewed last week or so and said I bet Nvidia will come up with something better. . . I guess I may be right, true I can't prove it yet but after researching this articles points, it just seems that nvidia is going down a route that looks like the performance of the new cards really will be better than ati's offerings.

Zeromus said:

Full FPU precision? Ah I see, so it took double the work to do scaled precision computing such 128-hdr processing, and now less work is loaded on this Fermi platform due to its full support of double precision floats. The document says past GPUs only operating on 24-bit floats at a time, and weren't conforming with programming languages that dispose to the CPU dword single precision and qword double precision specifications, and now since it's ALUs fully support computation upon whole 32-bit and 64-bit floats are supported and obviously conform to the 754's precision width; this was done because applications that required only integer arithmetic relied what was available and used emulation utilized multiple 24-bit width precision float instructions. Now that such a conformity exists, programmers can easily operate on this scenario kind without worrying about needed emulation opposed to operating on floats like on the x87.

Adhmuz Adhmuz, TechSpot Paladin, said:

All I have to say is: 8 Times the peak double precision arithmetic performance!?! No WAY!!

ATI cannot compete with that peak double precision arithmetic.

Nvidia FTW!!

ROFL

BrownPaper said:

@Adhmuz: Who cares? This card will probably cost $1000 per any way. Fanboy = fail.

Guest said:

You're an *****. GTX 280 blew chunks at double precision, it was utterly terrible; only one unit capable of double precision per SM.

GTX 280 could crank out 78 GFLOPS, HD 4870 could do 240 GFLOPS. 8 times 78 GFLOPS would put Fermi at 624 peak GFLOPS compared to HD 5870's 544 GFLOPS.

Adhmuz Adhmuz, TechSpot Paladin, said:

@Adhmuz: Who cares? This card will probably cost $1000 per any way. Fanboy = fail.

Sarcasm is sometimes hard for some to grasp I guess. I'm on a put up or shut up basis at the moment, if what Nvidia comes out with sucks I'm going ATI simple as that. Also I'm low on disposable income so I have no reason to be spending the kind of money required to upgrade my graphics solution. :p

Guest said:

Are you guys talking about double precision performance?

As you mentioned there is only one 64-bit FPU unit in the 200 series. If you are talking about games then you shouldn't care so much about double precision performance. In games single precision is often enough, in fact, it is often enough for a lot of HPC applications.

The peak performance when using single precisiosn should be at least 3 TFLOPS for the new generation.

//j

Load all comments...

Add New Comment

TechSpot Members
Login or sign up for free,
it takes about 30 seconds.
You may also...
Get complete access to the TechSpot community. Join thousands of technology enthusiasts that contribute and share knowledge in our forum. Get a private inbox, upload your own photo gallery and more.