IBM is in the process of building the largest data repository ever constructed, with a combined storage capacity of 120 petabytes. The facility is being developed at the company’s Almaden, California research center.

Technology Review reports that IBM is building the record-breaking storage system for an unnamed client that needs a supercomputer capable of real-world phenomena simulation, such as those used to model weather and climate.

120 petabytes is an enormous amount of storage. To break things down, 1024 gigabytes equal one terabyte. 1024 terabytes equal one petabyte. If I’ve done the math correctly, 120 petabytes is equivalent to 124 million gigabytes.

To put that into perspective, the system could store 24 billion five megabyte MP3 files. Furthermore, about 60 backup copies of the Internet Archive’s WayBack Machine could be stored, with each copy containing 150 billion web pages. In total, the system is expected to hold roughly 1 trillion files.

IBM will use 200,000 conventional hard drives to create the data container, which will be about 10 times larger than any previous effort. With so many disks in the array, it’s inevitable that drives will fail, perhaps on a semi-regular basis. IBM is preparing for such a scenario by storing multiple copies of data on different disks as well as employing new methods to keep the supercomputer running at almost full speed should multiple drives expire. According to director of storage research and project leader Bruce Hillsberg, the system should not lose any data for a million years.