It is often stated that what people say on the Internet will last forever. This statement always makes me laugh. It is incorrect. First, nothing lasts forever – well, except God if you believe, but that is not what I’m talking about here. All things are transient in nature.
Librarians have been wrestling with issues around ephemera for generations. In library school, you can take courses focused just on ephemera. How is it defined? How do you catalog it? Do you even acquire such things for your collection? If you do, how do you decide what is worth saving and what is not? These questions have been around since the first librarian started building his or her collection. Not everything can be saved – and not everything is worth saving.
Let’s face it, most of what is written – in newspapers and magazines, let alone blogs – is not going to be kept for future generations to peruse. It just is not going to happen. Let’s look at the paper based world first.
There are many books written every year. There are too many to count accurately due to the numerous independently published books. Most don’t even break even. The authors never receive any residuals (if they contracted for them) because the book never leaves the bookstore shelf. Will this book be remembered in 20 or 30 years? With the nature of copyright laws, it is highly doubtful, and more likely that it will disappear from this earth forever. If a book is picked up by a few people it has a slightly better chance of surviving, but it still will probably disappear. Books are the most permanent of media in today’s world. (I will get to why electronic media is not very permanent later.)
Next, we have newspapers and magazines. Most of the news-centric ones like to claim that they are highly accurate. Well, I did my undergraduate degree in history (and Experimental Psychology) and let me tell you, there are more retractions and misrepresentation of facts in the average newspaper than you can shake a stick at. It isn’t necessarily because the reporters and editors are bad. Sometimes it is. More times it is because they just got something wrong and there has to be a correction. Depending on how "popular" that first story is, well, it can get hard to get the public to accept the corrections because corrections are hidden in the back pages of the paper. But, even these tend to be forgotten and lost over time.
Sometimes inaccurate stories were intentional. You do have the exceptional rapscallions of Samuel Clemens and Ambrose Bierce who would compete with each other for who could get the most ridiculous story accepted as fact by the editors back east – The Notorious Jumping Frog Of Calaveras County by Mark Twain was originally a news story about rocks that jumped on their own. There are people that to this day believe there are jumping stones in Calaveras County. But, again, the majority of things that were written are completely forgotten except by historians (and English Professors) who spend hours in often futile efforts trying to read yellowing and fading newspapers.
Some of you may start thinking that microfiche and microfilm have preserved newspapers and magazines. Well, first, the cost of moving ephemera is not unsubstantial, so there was only a selective group of newspapers and magazines that were conserved. Researchers in areas such as women studies, for instance, have run into this because women’s periodicals, in general, were not considered as important as the news centric newspaper – and most smaller periodicals are completely lost to time because there was not a large enough market to sell that microfiche / microfilm to.
Now, let’s talk about pamphlets and flyers. Believe it or not, libraries collect those – but conservation of those is something that is a full time job in archival institutions that want to keep them. The average library or collector is going to cull his or her collection so that only those perceived as most useful are going to be around for any amount of time. This is why you have comic books (once classed in this arena) that are so hard to find. They are ephemera.
Let’s move on to the recorded word. How many people have a crank turned record player? And records? I have them stored somewhere. I know the U.S. Library of Congress has a collection. The problem? They degrade over time. The same holds true for anything on a recorded medium – film, reel-to-reel, 8-tracks, cassette tapes, vhs tapes, cds, dvds – all of these degrade and if they are not transferred to a new medium they disappear. Part of what librarians and archivists do is cull collections so that the seemingly most valuable items are available for the future. There are limited funds, so not everything can be transferred to a new medium.
Electronic media is nothing new to the world of libraries. Do you remember punch cards? Scan-tron? Magnetic tape (still in use, and often degrades much faster than expected)? Floppy disks (All the different sizes)? Those are just the physical media. How on earth do you actually run and look at these items when you can’t get working hardware? The hardware needed for many of these are unavailable today. (I have run into this in the private sector as well!) Whatever was on those is gone forever as soon as the last piece of hardware that can read them is gone.
Ok, now we get to the Internet. Some of you are saying, whoa MLO, the Internet keeps everything. Well, no, it doesn’t. And, it isn’t just a matter of whether something is somewhere in the ethers of the Internet as to its permanence.
There are plenty of things that are gone forever from the early days of the Internet despite such projects as the Wayback Machine. Let’s start with email. Yes, AOL and Compuserve probably have tapes of everything anyone has ever typed on their services for advertising purposes – but, well, the computing power is such that most of it is on slowly degrading magnetic tape. Not too useful. Not all email lists are archived. Even those that are sometimes (many times) run out of money and that information is just gone – it can also be very easily altered. Even if the owner doesn’t actually run out of money, they may decide to purge the data. Trust me, as someone who has worked in IT for a long time, data is purged whenever legal to never be retrieved again without massive amounts of forensic computer work.
There are way too many black hats on the Internet to assume what you are reading is as it was when it was originally written. Of course, most people really don’t understand the actual costs of keeping a large web site – even a "locked" archive is not to be 100% trusted to last over time or be accurate.
Let’s talk about the precursor of the World Wide Web – GOPHER. Almost all of the information that was on GOPHER is gone. It is simply gone. There is no way to get it back without someone recreating the data. It is not stored anywhere. Some of Usenet News is like this as well. Depending on who was administering the group, there may or may not be archives that can be retrieved to look at how that particular group developed over time. I see no reason to think the current web and social networking sites won’t go the same route. Will there be exceptions? Of course! There are always exceptions, but probably about 90% of what is currently online is simply going to disappear over time due to anything from no money, censorship, technology fade, or other factors.
There is another issue that affects ephemera – and electronic media even more so. The noise issue. In information theory, there is an idea that the more information (signals) there is (are) the harder it is to find your specific information (specific signal). Without doing very specific things technologically a signal will be lost. Search engines (and their underlying algorithms) are the current method for finding information and filtering through the noise. As these change over time – and they are constantly being refined – much of what is currently considered "hot" or "important" to retrieve is going to be "unfindable" due to the noise around the item. It already happens in the print world. This is why librarians developed controlled vocabularies – a taxonomy is a very specific type of classification scheme that will soon be irrelevant to Internet information. Taxonomies are always hierarchical and the Internet is more like the Memex machine of Vannevar Bush.
Never heard of Vannevar Bush? Well, without his theories there would be no Internet. He proposed a machine that would allow a researcher to connect all the different realms of data together. It was a sort of word association. You know the experience where you start searching for one term and then end up searching for another term that really has nothing to do with what you were originally looking up? Well, that is what he hypothesized. Do you know what happens over time? Everything becomes connected to everything else. This makes the associations useless because you can no longer find the information. Because most social networking sites such as blogs and the like tend to be under the radar of the folks trying to preserve information over time, most of that is going to get lost in the noise.
In summary, technology changes, information storage costs money that can dry up, sites can be corrupted, sites/technologies can be abandoned, information is lost to the overall "noise", and it can even be censored (yes, I know about "rerouting information, it doesn’t always work.) So, after this long, somewhat (hopefully) coherent rant, I hope you understand why I laugh when I hear that whatever you write on the Internet is there forever. As the saying goes, "There is nothing new under the sun, " and the Internet is just publishing (including ephemera) under a new guise.