All articles
investigations

Under Forty Gigabytes of Grief: The Desperate Race to Save GeoCities Before Yahoo Pulled the Plug

On April 23, 2009, Yahoo sent a notification to the internet that it was going to delete GeoCities. Not restructure it. Not migrate it. Delete it. October 26, 2009 was the date. Everything gone. Thirty-eight million user-created pages — the fan shrines, the MIDI-soundtracked poetry collections, the animated under-construction GIFs, the personal homepages that represented the first time an entire generation of Americans had ever published anything to a public audience — scheduled for the bit bucket.

The reaction from the tech community ranged from resigned shrugging to genuine, almost embarrassed grief. GeoCities was deeply uncool by 2009. It had been uncool for years. MySpace had come and gone. Facebook was ascendant. The tiled background images and blinking text of the GeoCities era felt like something from a previous geological epoch. And yet.

What GeoCities Actually Was

GeoCities launched in 1994 as Beverly Hills Internet, rebranded to GeoCities in 1995, and by 1999 was the third most visited site on the entire web. The concept was almost aggressively simple: free webspace, organized into themed "neighborhoods" with names like Silicon Valley (tech stuff), Heartland (family and personal), Area51 (sci-fi and paranormal), and SunsetStrip (music). You got 15 megabytes. You got a page builder if you needed it. You got a URL that looked like geocities.com/SiliconValley/Pines/3742 and you got to work.

What people built on those 15 megabytes was, in retrospect, the most authentic snapshot of popular American internet culture ever assembled. Not the tech industry's version of the internet. Not the media's version. The actual version — the Dragon Ball Z fan pages, the Wicca resource guides, the pages dedicated to specific episodes of The X-Files, the local band homepages, the personal journals of people who had never written for a public audience before and were figuring it out in real time, in public, with animated mailbox GIFs.

Yahoo bought GeoCities in 1999 for $3.57 billion in stock. It was the peak of the bubble and Yahoo was spending money like it had invented the concept. They proceeded to do almost nothing useful with it for a decade, watched it slowly decay in relevance, and then announced in 2009 that they were just going to delete the whole thing.

Jason Scott and the Archive Team

If you know anything about the effort to save GeoCities, you know the name Jason Scott. Scott is a documentary filmmaker and digital archivist who had already made films about BBSs and text adventures and had strong opinions — loudly expressed — about institutions that destroyed digital history through negligence or indifference.

When Yahoo made its announcement, Scott essentially declared war. He assembled Archive Team, a loose coalition of volunteers, archivists, and technically capable people who were willing to spend their bandwidth and their weekends crawling the GeoCities corpus before the deadline. The group had no budget. They had no official relationship with Yahoo. They had wget, they had hard drives, and they had the specific kind of furious motivation that comes from watching something irreplaceable get scheduled for destruction.

The technical challenge was enormous. GeoCities had approximately 38 million pages spread across thousands of neighborhood subdirectories. Crawling it systematically required coordinating dozens of volunteers running download scripts simultaneously, avoiding rate limiting, and managing the sheer volume of data being generated. This was not a casual weekend project. This was a distributed systems problem being solved by volunteers with day jobs.

The 650GB Torrent

When October 26 arrived, Archive Team had managed to save approximately 650 gigabytes of GeoCities content — somewhere in the neighborhood of a million individual pages, representing a fraction of the total corpus but a substantial preservation effort given the constraints.

The result was packaged as a torrent and made available through the Internet Archive, where it remains today. The torrent itself became a minor internet artifact — something that people seeded out of a sense of digital civic duty, the way you might donate to a historical society. The GeoCities archive is one of the most-seeded preservation torrents in history.

But 650 gigabytes out of what would have been several terabytes of complete content means the rescue was incomplete. Huge swaths of GeoCities neighborhoods were not captured. Entire communities — the Heartland family pages, significant portions of WestHollywood, the personal journals of thousands of people who had used GeoCities as a proto-blogging platform — exist only in fragments or not at all.

The Internet Archive's Wayback Machine had been crawling GeoCities for years independently, and its captures add significant coverage. But even combining the Archive Team torrent with Wayback Machine snapshots, researchers estimate that a substantial portion of the original GeoCities corpus is simply gone.

What the Surviving Pages Tell Us

Digital archaeologists who have spent time with the GeoCities archive describe the experience as disorienting in a specific way. These pages were not designed for posterity. They were designed for the present tense of whenever they were made — 1997, 2001, 2004 — and they reflect that immediacy in ways that carefully curated social media profiles don't.

The fan pages are encyclopedic in a way that predates Wikipedia and shares its obsessive energy. A 1998 GeoCities page about Final Fantasy VII might contain more specific, accurate information than anything that existed in print at the time, written by someone who had played the game hundreds of hours and had strong opinions about the Aerith situation that they were going to share whether you asked or not.

The personal pages are stranger and more affecting. People wrote about their lives with a candor that the social media era, with its audience awareness and engagement optimization, has largely eliminated. There are pages about grief, about small-town boredom, about being gay in a place where that was dangerous, about chronic illness, about hobbies so niche they had no other community. These pages were often the only place these people existed on the internet. When Yahoo deleted GeoCities, it deleted them.

The Lesson Yahoo Taught Us

The GeoCities deletion is the canonical case study in what archivists call digital dark ages — the phenomenon where digital content is actually less durable than physical content because it depends on active maintenance by institutions that may not prioritize preservation.

The books in your local library from 1997 still exist. The GeoCities pages from 1997 mostly don't. The failure mode is institutional: Yahoo owned the infrastructure, Yahoo had no financial incentive to maintain it, Yahoo deleted it. No library would do this with physical collections. The legal and cultural frameworks that protect physical cultural heritage have no real equivalent for digital content hosted on private infrastructure.

Archive Team and the Internet Archive understood this before most institutions did, and the frantic GeoCities rescue effort was a proof of concept for a kind of digital preservation that has since become more organized and more systematic. The Archive Team Warrior software that came out of subsequent projects lets volunteers contribute to preservation crawls automatically. Lessons were learned.

But the GeoCities that got away is still gone. Somewhere in the unarchived 90% of those 38 million pages are the first things millions of Americans ever wrote for a public audience — their first attempts at having a voice on the internet, preserved nowhere, readable by no one.

Yahoo deleted the evidence. We're still figuring out what we lost.

All Articles