More details emerge about Intel's Cougar Point chipset flaw

Jos

Posts: 3,073   +97
Staff

More specifics of the problem affecting Intel's 6 Series chipset (codenamed Cougar Point) have emerged courtesy of Anandtech. Speaking with Steve Smith, VP and Director of Intel Client PC Operations and Enabling, the site was able to confirm that the fault affects both P67 and H67 chipsets, pin-pointing the problem to a transistor that's connected to the 3Gbps SATA II ports.

Cougar Point has two sets of SATA ports: four that support 3Gbps operation and two that support 6Gbps. For the first set Intel states that performance of SATA-linked devices "may degrade over time". However, since 6Gbps SATA III circuits have their own independent clocking trees these ports will be unaffected by the problem.

Apparently an OEM that was testing the platform with extreme heat and voltage environments originally discovered the problem and reported it back to Intel. The chip giant believes that no end users have been affected so far -- and even if they are at least data loss isn't a problem. Intel estimates that the failure rate over a 3-year system life would be about 5%, which is high enough for the company to stop shipments and work on a silicon fix.


Current 6-series motherboard owners worried about performance issues are advised to try and only use ports 0 and 1 (the 6Gbps ports) until they can get a replacement or fix. Of course, for those with more than a couple of drives this will be impossible to do, but thankfully many motherboard makers often support additional SATA ports through third-party chipsets, or on a PCI-E SATA or RAID controller, and these will remain unaffected.

The scenario is a little less clear for Sandy Bridge notebooks since they don't usually have a lot of storage bays and thus don't always use all of the ports a chipset offers. If the design only uses SATA ports 0 and 1 then the end user would never encounter an issue and models currently in the pipeline shouldn't be affected by any delays, at least in theory.

Permalink to story.

 
The real question - what will they do for customers?

I bought an ASRock P67 Extreme4 motherboard and Intel 2600K chip. This thing overclocks to 4.847GHz stable and runs SuperPi 1M in 7.691s. They will only get this rig out of my cold, dead hands. Overclocking depends heavily on the particular motherboard and chip, as well as the positioning and make of the heatsink. I got lucky and everything came together perfectly! First time ever to max out everything!

I have an SSD on SATA port 0 and a fast 1 TB drive on port 1. The seldomly used DVD player is on port 5. E-SATA is on both the OEM SATA ports. Only the DVD may possibly be affected if anything ever goes wrong with the SATA II connector, and nothing may.

So, what are they going to do to replace this board or fix it? I'd be happy if they just sent me a free PCIe SATA controller in case I ever need it and maybe a coupon good towards a future product.
 
I only have two drives and they are both connected to 3Gbps SATA ports. If I move them to ports 0 and 1 they will still work, right?
 
leondobr said:
I bought an ASRock P67 Extreme4 motherboard and Intel 2600K chip. This thing overclocks to 4.847GHz stable and runs SuperPi 1M in 7.691s. They will only get this rig out of my cold, dead hands. Overclocking depends heavily on the particular motherboard and chip, as well as the positioning and make of the heatsink. I got lucky and everything came together perfectly! First time ever to max out everything!

I have an SSD on SATA port 0 and a fast 1 TB drive on port 1. The seldomly used DVD player is on port 5. E-SATA is on both the OEM SATA ports. Only the DVD may possibly be affected if anything ever goes wrong with the SATA II connector, and nothing may.

So, what are they going to do to replace this board or fix it? I'd be happy if they just sent me a free PCIe SATA controller in case I ever need it and maybe a coupon good towards a future product.

I'd be happy with a tube of thermal compound and sending me a replacement board BEFORE I send my one back.... but yes, I'm enjoying my machine, had the last one for almost 4 years. Not looking forward to sitting with everything but the motherboard for weeks :S
 
mcmurphy12 said:
I only have two drives and they are both connected to 3Gbps SATA ports. If I move them to ports 0 and 1 they will still work, right?

Yes they will work perfectly, just note that we still talking about a 5% decrease over a 3 year lifespan and even then u still have to be a super user on hdd workload, so i seriously doubt many ppl will be affected since u prolly replace your current newly pruchased mobo within that limit.
 
Yes they will work perfectly, just note that we still talking about a 5% decrease over a 3 year lifespan and even then u still have to be a super user on hdd workload, so i seriously doubt many ppl will be affected since u prolly replace your current newly pruchased mobo within that limit.

It's not a 5% decrease in performance, but rather an estimated 5% failure rate. In other words, besides the performance issues which you may or may not experience, there is a chance that during normal use some of the 3Gbps ports will stop working altogether over the course of 3 years.
 
Well, I don't plan on having more than two drives in this computer anyway, so if I do run into issues I will probably just move them to the other ports. If, for whatever reason, I decide to expand, I will probably just purchase a PCIe SATA controller.
 
This kind of issues, make sense on wait for next geration CPU or Ivy Bridge possible later this year.

I think, it isn't good hurry to buy sonthink with a fail.... Yes, I will wait for the Ivy Bridge on 22 nm.
 
I am so glad that I moved over to AMD, even if their performance is not as great.
 
Guest said:
I am so glad that I moved over to AMD, even if their performance is not as great.

The issue will barely affect any intel users. If they hadn't owned up it probably wouldn't have become a widespread issue in the future.
 
Still an Intel user. AMD has failed too many times for me in the past. I think it was great that Intel did own up and didn't let it slide.
 
mcmurphy12 said:
Still an Intel user. AMD has failed too many times for me in the past. I think it was great that Intel did own up and didn't let it slide.

Has anyone noticed a continued trend of flawed technology products? The iphone 4 had too many to list, the galaxy s has a flawed gps chip and the Bell versions have dying internal SDs. We have the android 2.2 issue where it sends texts to the wrong contact. We have the Notion ink Adam being bricked. Now we have this issue.

Has the worlds standards for quality gone down by so much in the past couple of years?
 
Gigabyte motherboards have a 3 year warranty.

Hopefully the failure will time itself to the day before the 3 years expire, not the day after.
 
Ports 0 and 1 are the SATA III ports. Ports 2 thru 5 are the SATA II ports. This should be mentioned in the article, it was on AnandTech. Also the AS Rock manual.
 
It also looks like Gigabyte is going pro-active in addressing this problem.

Gigabyte Stops shipping all 6-series motherboards
Gigabyte has halted all Sandy Bridge shipments and it's setting up an online program for all consumers who happened to buy a 6-series motherboard.

Gigabyte noted that the SATA issue would eventually affect about five percent of all users over the course of three years, so there doesn't to be much to worry about in the short term.

The company has set up call centres and users are advised to visit http://ggts.gigabyte.com for online support.
 
Has anyone noticed a continued trend of flawed technology products? The iphone 4 had too many to list, the galaxy s has a flawed gps chip and the Bell versions have dying internal SDs. We have the android 2.2 issue where it sends texts to the wrong contact. We have the Notion ink Adam being bricked. Now we have this issue.

Has the worlds standards for quality gone down by so much in the past couple of years?

I have nothing concrete to base this on, but I will take a shot.

1) I would wager that the in the interest of getting new tech out the door, testing time has shortened, and 'artificial aging' has not been perfected yet.
2) It has been speculated that 11-16 nm is the theoretical limit for transistors as we know them, (and we are getting down there) I have read that every shrink it becomes exponentially more difficult to control gate length.

....Just a theory
 
I have nothing concrete to base this on, but I will take a shot.

1) I would wager that the in the interest of getting new tech out the door, testing time has shortened, and 'artificial aging' has not been perfected yet.
2) It has been speculated that 11-16 nm is the theoretical limit for transistors as we know them, (and we are getting down there) I have read that every shrink it becomes exponentially more difficult to control gate length.
....Just a theory

1) I agree
2) Will probably become more of a problem with gating in CPU's. The Intel SATA controller transistor fault probably wouldn't factor in here as the controller is built on the 65nm process used for the previous (P55/X58) platforms.
 
1)The Intel SATA controller transistor fault probably wouldn't factor in here as the controller is built on the 65nm process used for the previous (P55/X58) platforms.
So, if we were to draw a conclusion, should the older P55-X58 platforms succumb to this fault also, or is it specifically inherent to the new chipsets?
Has the worlds standards for quality gone down by so much in the past couple of years?
Probably not as much as the comsumer's greed for more tech products sooner has gone up.

When you mix the two, it starts to become a "chicken or the egg" paradigm.

So, which came first, more buying, or less testing. I think that's how that would go.
I
1) I would wager that the in the interest of getting new tech out the door, testing time has shortened, and 'artificial aging' has not been perfected yet.
"Artificial Aging", is normally accomplished through accelerated UV exposure, and heating and cooling cycles.

So, it works far better with a product intended for outdoor use, than an indoor product such as computer. You can only overheat a semiconductor so much, or else it fails. So, thermal cycling loses a great deal of its efficacy also. Beyond that, any thing else that might be attempted, causes excursions far beyond normal operating parameters, thus perhaps nullifying many of the results.
 
Princeton said:
Has anyone noticed a continued trend of flawed technology products?... Has the worlds standards for quality gone down by so much in the past couple of years?

You're probably right in some ways because I feel like there's a culture of "We'll fix it with a software update later".

However, problems like these have always been there, but now they are produced on an even more massive scale, integrally affect our lives more than ever and publicized like never before. The level of effort gone into quality control probably hasn't gone down, but as things become more complex, QC probably hasn't increased accordingly.

We also have a problem with sensationalized media and the speed at which information moves these days (twitter, blogs and their ilk). Bugs like the iPhone's alarm clock not working after New Year's would have NEVER made the news 10 years ago. It's silly and non-consequential to the vast majority of the world.... but boy do we ever hear about it... and from multiple sources.. multiple times a day.

I remember a long string of much more serious issues from long ago ranging from the MTH bug on i820 boards (Pentium III... affected millions of computers around 2000) to massive recalls from manufacturers like Dell, HP and others due to bad caps, faulty batteries, defective GPUs and god knows what else (pick any year for the past 20 years). The IBM "DeathStar" incident (affected multiple brands, actually around 2003), RRODing Xboxes (circa 2006), data loss on Intel-based VIA chipsets (2002ish) during heavy PCI load and.... You get the picture.

You can go futher back in to the 90s and remember gems like the (infamous) Pentium FDIV bug (1995), Pentium III 600Mhz recall due to stability issues (1999), laughable amounts of battery/AC adapter recalls from all brands (all the time), various CRT recalls for catching on fire, tons of vehicle recalls (every year for decades) and much more, I'm sure.
 
Rick said:
Princeton said:
Has anyone noticed a continued trend of flawed technology products?... Has the worlds standards for quality gone down by so much in the past couple of years?

You're probably right in some ways because I feel like there's a culture of "We'll fix it with a software update later".

However, problems like these have always been there, but now they are produced on an even more massive scale, integrally affect our lives more than ever and publicized like never before. The level of effort gone into quality control probably hasn't gone down, but as things become more complex, QC probably hasn't increased accordingly.

We also have a problem with sensationalized media and the speed at which information moves these days (twitter, blogs and their ilk). Bugs like the iPhone's alarm clock not working after New Year's would have NEVER made the news 10 years ago. It's silly and non-consequential to the vast majority of the world.... but boy do we ever hear about it... and from multiple sources.. multiple times a day.

I remember a long string of much more serious issues from long ago ranging from the MTH bug on i820 boards (Pentium III... affected millions of computers around 2000) to massive recalls from manufacturers like Dell, HP and others due to bad caps, faulty batteries, defective GPUs and god knows what else (pick any year for the past 20 years). The IBM "DeathStar" incident (affected multiple brands, actually around 2003), RRODing Xboxes (circa 2006), data loss on Intel-based VIA chipsets (2002ish) during heavy PCI load and.... You get the picture.

You can go futher back in to the 90s and remember gems like the (infamous) Pentium FDIV bug (1995), Pentium III 600Mhz recall due to stability issues (1999), laughable amounts of battery/AC adapter recalls from all brands (all the time), various CRT recalls for catching on fire, tons of vehicle recalls (every year for decades) and much more, I'm sure.

But look at video games. Nowadays they're filled with bugs because they can just be patched up later. You could argue that games in the past were more simple, but Smash Bros Brawl for the wii is just as complex as a call of duty game. And it suffers from very few bugs, none of which disturb gameplay. In my opinion this is because games on the wii typically can't be updated.
 
I would think that communication and coordination between individuals, departments, and yes, computers, are issues that effect QC dramatically.

I think I read that it took about 130 employees to pump the yearly issue of, "Madden Football" out the door. Look how many it takes Intel to pump out a chipset.

Competition between makers also gas to effect QC. "We gotta get it to market before the other guy gets his to market", doesn't seem like a fertile ground to allow perfect QC to spring up.
 
So, if we were to draw a conclusion, should the older P55-X58 platforms succumb to this fault also, or is it specifically inherent to the new chipsets?
As I understand it, the problem is peculiar to the Sandy Bridge controller only. Tinkering/tweaking with the controller during the redesign process ( from exclusive SATA 3Gb to SATA 3Gb + 6Gb).

I'm still at a loss to understand why Sandy Bridge owners are supposedly stampeding to get their boards returned. I've only put together two SB systems (one Asus, one Gigabyte) and as far as I can tell from contacting the support sites, neither company has yet organised a schedule for RMA since Chinese new year has effectively shut down most of Asia.
Common sense might indicate that only users who like the idea of RAID but are too tightwad to invest in a hardware RAID card are probably the only users who have the potential to be majorly inconvenienced. Judging by the hysteria you'd think that the chipset was in danger of combusting and emitting Plutonium in some kind of alchemical transmutation.
I think you can guarantee that RMA's will be handled in batches as revised chipset deliveries come on stream -hopefully RMA's take prescedence over pushing new revision boards into the channel, but I suspect that most vendors will allocate stock to fulfil both areas.

The main concern going forward would be that the dodgy chipset boards already in circulation don't magically find their way into the resellers market or less scrupulous retail/etail stores.
 
So, basically, early adopters are being victimized by their own impatience. Am, I correct in thinking these early glitches will be be attended to by the time most board revisions are released?
 
Back