GitHub is using archival film stored in the arctic to backup all repositories

mongeese

Posts: 643   +123
Staff
In a nutshell: On February 2, GitHub grabbed a snapshot of all the public repositories live on the site at the time. They moved the data onto archival films and shipped them off into the deep arctic where they’ll be stored for the next millennium. They hope that one day, the open-source data can be used by historians or future civilizations to understand the dawn of computing: the present.

The snapshot included any public repository that had at least 250 stars, that had at least one star and had been updated in the past year, or that had no stars but had been updated in the previous eighty days. If you’ve ever uploaded to GitHub, you’ve probably got your name and a creation stored in the arctic. Clicking on the Arctic Code Vault Contributor badge in the highlights section of a profile will reveal which of a user’s projects were saved in this snapshot.

GitHub had Piql transfer the data to digital photosensitive archival film at their facility in Drammen, Norway. From there it was flown to the Svalbard archipelago, situated six-hundred miles north of the European mainland. On July 8, it was taken into a disused coal mine near Longyearbyen and placed into a deep chamber beneath hundreds of meters of permafrost. Sealed away, the data is as secure and safe as anything can be.

In addition to the repositories, GitHub also saved a few classic works of humanity and an introductory letter in case it’s discovered after an apocalypse, or by aliens, or by something that doesn’t know much about present humanity. “This archive, the GitHub Code Vault, was established by the GitHub Archive Program, whose mission is to preserve open source software for future generations,” the letter reads. “You may be reading this one year from now, or one thousand, but either way, we hope its contents, and perhaps the very concept of open source, are useful to you.”

GitHub also has some more strategies to save their data. They’ve partnered with the Internet Archive to create ongoing backups of GitHub via the Wayback Machine. The Software Heritage Foundation is saving and cataloging individual projects, and has currently saved about a hundred million from GitHub. Most amusingly, GitHub has also saved six thousand of the most popular repositories on quartz glass crystals, via Project Silica. Quartz storage is effective for tens of thousands of years and resistant to radiation and tough environmental conditions.

Permalink to story.

 
There is no doubt of the importance of such a project but it MUST have a very flexible indexing system that can be moved to newer and newer generational systems and it must have multiple parallel systems to insure it's integrity, otherwise you end up with a vault full of billions, if not trillions of useful items only to have to wade through them to find what you want .....
 
And then comes melt water flood and destroys it all. The end. Not the first time.
 
1000 years from now, these records will not be found

They will all be mysteriously replaced by thousands of Terabytes of ancient porn, and every version of cracked software known to man
 
There is no doubt of the importance of such a project but it MUST have a very flexible indexing system that can be moved to newer and newer generational systems and it must have multiple parallel systems to insure it's integrity, otherwise you end up with a vault full of billions, if not trillions of useful items only to have to wade through them to find what you want .....
if there is no doubt how much of such data from the past are you/we using? the good knowledge just survives in different ways.
this is absolutely useless and is only satisfaction for the people having the idea of realizing it
 
Only 21TB seems to be so pointless.... In such a large container too. Surely there must be better formats for such a long preservation.

Was thinking about that. 21TB really isn't that much. It's double what I have on my NAS but that's a simple file server. For a corporation with millions of daily users I would've thought it would've been hundreds or thousands of TB worth of data.

Also says they used 186 tapes to store that amount of information which seems like an insane amount. They could've used 4 of the newer WD drives (2 to hold the data and 2 for integrity verification) to do the same job.
 
Only 21TB seems to be so pointless.... In such a large container too. Surely there must be better formats for such a long preservation.
It sounds like the film was selected to maximize the chances of data recovery, like they put the code 'plain text' onto the film itself - like a picture. And by using archival quality, the pH of the film should already be balanced so that it won't degrade with time.

As long as the climate inside of the mine remains the same, it shouldn't be too much to expect someone to one day hold the film up to the light and read through all the code, line-by-line.
 
Back