Microsoft servers leave 800 planes in the air

Status
Not open for further replies.

Phantasm66

Posts: 4,909   +8
A "design anomaly" in the way Microsoft Windows servers were integrated into a Californian air traffic control system left 800 planes in the air without contact to air traffic control, it has been revealed.

A combination of good old human error and a design glitch in in the Windows servers led to at least five cases where planes came too close to one another. Air traffic controllers were reduced to using personal mobile phones to pass on warnings to controllers at other facilities.

The Windows servers were brought into Southern California's air traffic control system over a period spanning the last three years, in order that they replace the radio system's original Unix servers. The Windows servers were running Microsoft Windows 2000 Advanced Server and using Dell hardware. Microsoft have not commented on the issue.
 
This article is such a load of crap. Windows does not have to be rebooted every 49.7 days to keep from being "overloaded with data". There is an API in Windows called GetTickCount() which does reset at that interval because it is limited as to how high it can count; this is a limitation of 32-bit platforms and not with the OS. It has nothing to do with "being overloaded with data" or having to restart Windows. It's there for other programs to use and if some code monkey writes a program that does not use it properly, well...

The problem here lies not with Windows but with the shoddy program they are running and the imcompetent people running who are in charge of them. Nice way to spread FUD though techworld.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/gettickcount.asp
 
Originally posted by Unregistered
There is an API in Windows called GetTickCount() which does reset at that interval because it is limited as to how high it can count; this is a limitation of 32-bit platforms and not with the OS.
If Windows' application programming interface has a timer function which is limited to 49.7 days, I'd say it is a limitation of the operating system.

32-bit platforms can deal with 64-bit numbers, and many other operating systems count uptime from 1st January 1970, using 32-bit integer which goes back to the smallest value on 19st January 2038.

Moreover, if time-critical applications are made that use GetTickCount, whose fault is it when errors happen because of the limitation?
 
Maybe it depends on how fast you count? How long is a 'tick' on windows compared to on other OSes? Surely if the OS works in a particular way, then any app designed for that OS should be designed to work correctly with it. The article does mention human error as one of the factors involved.
 
GetTickCount counts milliseconds and it returns a 32 bit value. The Windows API designers decided this sometime in the stone age and this is how it is. I suppose noone thought that anyone would ever want to run Windows longer than 49 days :p

The fact that this timer's overflow was not taken into consideration is indeed the fault of the flight software designers.
 
GetTickCount is not the only Timer and does not stop after 49 days either. The 32-bit number simply wraps around, and keeps going.

Should you wish to use GetTickCount, it is a simple software process to make use of this millisecond timer, by easily identifying when it wraps around and carrying over the result to another register. This is basic programming. Everything works like this on a computer.

From David U.K.
 
Originally posted by Unregistered
This article is such a load of crap. Windows does not have to be rebooted every 49.7 days to keep from being "overloaded with data".


I don't think its so much that Windows itself has the problem, the defect was more in the company's implementation. Also, the article has probably reported the technical angles in very broad terms. I work for a large organisation doing application support and we get connectivity issues all the time when things do or don't reboot when they should / shouldn't.

Some of these problems are due to people being stupid and lazy, yes, and these problems in the article were probably because of that, though. Most of the time, I think the problem is the human element, not the machines.
 
The system was designed to be serviced every 30 days by law. It was supposed to switch to a backup redundency system after the 40 some odd days for safety reasons if the scheduled maintenance wasn't done. The problem was the company contracted to do maintenance didn't show up. (they received an award for outstanding service at the same center in July). And the backup wasn't programmed to kick in when the main system shut down. Nothing reported about the incident or on the local news here in L.A. was grounds for more MS bashing IMO
 
Status
Not open for further replies.
Back