Posts: 1,026 +171
In brief: Losing data to a backup error can mean fretting over the loss of years worth of personal photos or, as in the case of Japan’s Kyoto University, losing 77TB of critical research data. The incident occurred with the university’s supercomputer that received a faulty software update for its backup system, accidentally wiping 34 million files over a two-day period.
The culprit for this huge data loss was a faulty script originally meant to delete old, unnecessary log files from Kyoto university’s Cray/HPE supercomputer as part of a software update. However, it ended up deleting a massive 77TB of research data between Dec 14 and Dec 16, 2021, from the computer’s high-capacity /LARGE0 backup disc.
The university initially estimated losing up to 100TB of data after the buggy update wiped nearly all files older than 10 days. The 77TBs of research data that actually got deleted contained 34 million files that affected 14 research groups. Although Kyoto University didn’t reveal the nature or details of the wiped research data, it noted (Japanese) that files belonging to 4 groups were irrecoverable.
The university’s supercomputer supplier, Hewlett Packard Japan (HPE), admitted 100 percent responsibility for the incident and issued a letter of apology later published by the university. HPE said a modified script was issued in its update to “improve visibility and readability,” as The Stack reports.
However, HPE said it wasn’t aware of the side effects of this behavior, which caused the modified shell script to reload in the middle of execution, resulting in “undefined variables” and deletion of files in the supercomputer's /LARGE0 backup disc.
Kyoto University has since suspended the backup process, as it looks to make improvements and add preventive measures to deal with such incidents in the future. In addition to mirror backups, the university also plans to maintain incremental backups once it resumes the backup program later this month.