The Real-Time Web creates Digital Waste

Category: Internet

Published: 01/26/2010 03:29 p.m.

What do we do with old newspapers? We throw them out and recycle them. We may save some in libraries for posterity, but the larger part gets tossed out daily. Digital content (like this article) doesn't take up physical space or have physical costs like newspapers. So, what do we do with our old web content? We keep it. I think that's a mistake.

Predictions are so Yesterday

There is currently a ton of buzz about a rumored Apple product. There have been thousands of blog posts, tweets, and articles by professional publications on the subject. They began years ago. And after the rumored product is announce, all the speculation will immediately become useless. There will be no more speculation. Instead we will be inundated by more semi-reviews and predictions and praises/condemnations.

A few weeks ago the National Championship game finally settled a debate that started before the football season began in August. Not only was their speculation similar to Apple's device, but there is hours (days) of video of sportscasters and others making predictions and speculations. Now no one is talking about it. But all of that old content is still taking up space in a database and out in the web.

The Speed of Web

The speed of the internet brings breaking news to laptops, browsers, and cell phones at a record-breaking pace. Many old media agencies are just now learning to break news quickly. Meanwhile the new media equivalents are publishing blog posts as fast as they get press releases to their inbox.
Sometimes I'm amazed how short the lag is between receiving a PR pitch in my inbox and seeing it regurgitated on Mashable. - Anil Dash, Jan 4, 2009
It's quite amazing how quickly news spreads, gets linked to, and tweeted and re-tweeted. This helps us grow a message, yes, but at what cost?

Duplicate news is not something new. The Associated Press has syndicated stories for decades. The intent behind this was that newspapers were locally based, so someone in Des Moines may not read the Chronicle or the Times while someone in Houston doesn't subscribe to the Register. And publishers have relationships with the AP to be able to reproduce that content. But just like almost everything, the internet is changing this too.

Link blogs and re-blogging spreads a message to broader audiences, but they are rarely as segmented as local newspaper readers in different cities. I read Waxy's Links, Gruber, and Kottke daily. When something interesting in the tech industry comes out, I may see it from all three of these guys. The long tail of re-bloggers is continually growing. The Spam scrapers that just steal content are growing larger than the legitimate content producers. It's only a matter of time before the internet is just one guy making stuff and the rest of the internet re-blogging 1,000,000 times over. It's not just the noise that I'm talking about. It's the redundancy of legitimate content.

Digital Waste over Time

All of this creates a mound of mostly useless data. Much of this web content could be labeled "Flow", according to a recent article by Robin Sloan. While I agree that a healthy amount of flow is good, I am beginning to feel the immense weight of this digital waste. Not only does this clutter data systems stored with great amounts of redundant info, but it starts to cloud the search engines and other tools for finding great things. Much of that data is expired in a way similar to old milk or forgotten fruit. And the data may not stink up the fridge now, but it will be there every time you need to find something specific from Google.

Searching Google for "photos of the apple tablet" (with quotes) brings almost 200,000 results. Now what are the chances that 200,000 people have posted original content and chose to use that phrase? Nine of the top ten results were the photos from Dustin Curtis. Eight of the ten on the next page were too. And this is just a single example of a single piece of content for a predicted product. Extrapolate that out in your head to the entire span of apple tabletry. Do we need all of this? No.

Fixing the Digital Mess

Tumblr does a decent job with its reblogging feature which allows for commentary while still attributing the original source, but it isn't perfect. Google groups search results from the parent URL, so we don't see 100 links from the same site because something is in their sidebar. But this isn't nearly enough. We need to do one of two things with all of this "flow", either 1) make smarter filters that find the original, or 2) start deleting things. I'm for the latter.

Yesterday's flow is often useless today. It has gone past us and is no longer needed. Let's get rid of it. Or, at least be smarter about archiving it. I recently moved my Google Reader Shared Items into tumblr. I did this mostly so have a cohesive group of things I've collected, but it wasn't until I read Stock and Flow that it hit me like a brick: I was trying to revive my flow. This was a wasteful activity, as I see the costs far outweighing the benefits. Will anyone really go back to what I shared in June 2008? No. No one cares because it's old news.

Separate Stock and Flow becomes even more vital now. I want a tool where I can keep and search the Stock I discover (del.icio.us perhaps?) while having the option to send something else to my flow (Tumblr) to be shared. Google needs to know what is Stock and Flow as well so it can relay information in those terms. As Google continues to come out with more features for the real-time web, I hope they will be smart enough to remove those same items from the old-time web.

We could also use a better archiving system. Unfortunately, current search algorithms aren't able to keep up with the pace with things and separate out the old from the new. My Gmail strategy is to archive everything once read, tag things with context, and then use search to find things later. It works pretty well on my 10,000 emails, but even it isn't perfect. Using this strategy for Flow doesn't work at all. Google options for search by date help, but they aren't the complete solution either.

Promote a change, smartly

It's going to take years for all of this to be sorted out. The cheapness of Digital turns many of us into pack rats. I still have files from my sophomore year in college. And they are backed up. In the same way that taking care of the planet is everyone's responsibility, I propose that taking care of the web is our responsibility, too.

If you want to create a campaign to promote this idea, that would be wonderful. But please don't create 20 different ones and then cross link them of 100 different blogs. Let's be smart about this. If you want to share this message then write something original and link to it, or tweet something original, or send an email to people who you know that use the internet.

We can all still share things in a way that duplicates less while still spreading a message. If you think of great ways to do this, please share them as well. I want to create a better web for everyone.