Archival journals require open access

March 18, 2012 ∞

Many academics don’t trust electronic-only journals; they think that paper documents are more durable, more archival, than electronic documents. Evidence for this is all around us: link rot in blogs and other web sites is pervasive, it approaches 3% in digital libraries, and in some cases publishers have lost the full text of articles for journals that cease publication or change ownership.

The funny thing is, we’ve known since at least the burning of the Library of Alexandria that paper (or papyrus, or any other physical medium) is no guarantee of durability. Brewster Kahle, who started the Internet Archive and knows a thing or two about document preservation, says that

the lesson of the first Library of Alexandria is “don’t have just one copy.”

If paper journals are more durable than electronic journals, then it must be because paper journals are using replication more effectively than electronic journals. This is strange, given how much easier it is to copy bits than paper, and how much computer scientists know about the uses of replication; yet it is so.

Consider how an issue of a typical academic journal gets published. The publisher prints up some number of copies of the issue, stores some in its own archives, and ships the rest to its subscribers. These subscribers include university libraries, who catalog their new acquisitions and store them in their stacks. University libraries can contain millions of volumes, and they maintain their holdings in climate- and moisture-controlled facilities, employing dozens of librarians and custodians to monitor the stacks around the clock.

At this point, if the publishing house were to burn down like the Library of Alexandria, the journal itself would survive. It is still available at dozens of libraries. Furthermore, the authenticity of any journal article can be established. Libraries are trustworthy: patrons know that they are dedicated to scholarship and its preservation, and tampering with the holdings is unlikely. If there were ever a question of tampering, the holdings of one library could be compared with others. (Libraries seem to be providing quite a lot of value to publishers; perhaps publishers should be paying libraries, instead of the other way round?)

In contrast, a typical electronic journal sits behind a paywall. The journal maintains the canonical copy; if they have mirrors or backups, the rest of the world is none the wiser. Libraries do not have copies of the electronic documents. Readers have to go through the publisher to read articles. Sometimes, authors are allowed to put a copy of their article on their web sites, but very often they cannot put up the “official” version.

The replication in this system, if any, is unsatisfactory. Libraries are not partners in the critical task of preservation. There are not multiple, independent copies of publications that can be compared for validation.

The problem here is not with the nature of electronic documents, but rather with publishers who are afraid of electronic documents. Open access publishers generally do better. The ArXiv is run by a library and has mirrors at other libraries. PLoS ONE has at least one mirror, and PLoS is experimenting with aggregating content published in other journals.

Open access enables replication, and replication is the essential ingredient of preservation. Therefore open access journals should emphasize this advantage over closed access journals, and they should pursue partnerships with libraries. (They should also start using cryptographic hashes to help authenticate replicated documents, and abandon PDF, a historical relic.)

Paper journals are on their way out. Nowadays a publication is more than just text—there is audio, there is video, there is source code, there are data sets. Electronic publications run by closed access publishers are also on their way out—they aren’t archival. Open access electronic journals are the only way forward.