Disclosure and ‘cook booking’

March 25, 2013

Many journals now have open data policies but they are sparingly enforced. So many scientists do not submit data. The question is: what drives them not to submit? Is it laziness? Is it a desire to keep the data to themselves? Or is it something more sinister? After all, the open data rules were, in part, to allow for replication experiments to ensure that the reported results were accurate.

Robert Trivers reports on an interesting study by Wicherts, Bakker, and Mlenar that correlates disclosure of data with the statistical strength of results in psychological journals.

Here is where they got a dramatic result. They limited their research to two of the four journals whose scientists were slightly more likely to share data and most of whose studies were similar in having an experimental design. This gave them 49 papers. Again, the majority failed to share any data, instead behaving as a parody of academics. Of those asked, 27 percent failed to respond to the request (or two follow-up reminders)—first, and best, line of self-defense, complete silence—25 percent promised to share data but had not done so after six years and 6 percent claimed the data were lost or there was no time to write a codebook. In short, 67 percent of (alleged) scientists avoided the first requirement of science—everything explicit and available for inspection by others.

Was there any bias in all this non-compliance? Of course there was. People whose results were closer to the fatal cut-off point of p=0.05 were less likely to share their data. Hand in hand, they were more likely to commit elementary statistical errors in their own favor. For example, for all seven papers where the correctly computed statistics rendered the findings non-significant (10 errors in all) none of the authors shared the data. This is consistent with earlier data showing that it took considerably longer for authors to respond to queries when the inconsistency in their reported results affected the significance of the results (where responses without data sharing!). Of a total of 1148 statistical tests in the 49 papers, 4 percent were incorrect based only on the scientists’ summary statistics and a full 96 percent of these mistakes were in the scientists’ favor. Authors would say that their results deserved a ‘one-tailed test’ (easier to achieve) but they had already set up a one-tailed test, so as they halved it, they created a ‘one-half tailed test’. Or they ran a one-tailed test without mentioning this even though a two-tailed test was the appropriate one. And so on. Separate work shows that only one-third of psychologists claim to have archived their data—the rest make reanalysis impossible almost at the outset! (I have 44 years of ‘archived’ lizard data—be my guest.) It is likely that similar practices are entwined with the widespread reluctance to share data in other “sciences” from sociology to medicine. Of course this statistical malfeasance is presumably only the tip of the iceberg, since in the undisclosed data and analysis one expects even more errors.

It’s correlation but it is troubling. The issue is that authors present results selectively and sadly this is not picked up in peer review processes. Of course, it goes without saying that even with open data, it takes effort to replicate and then publish alternative results and conclusions.

Looking again at “Big Deal” scholarly journal packages

February 18, 2013

One of the things pointed to in the debate over market power and scholarly journals is the rise of “Big Deal” packages. Basically, this has arisen as publishers bundle journals together for a single price. Indeed, as the publishers have merged and acquired more titles, these bundled packages have become more compelling with individual journal subscription pricing to libraries rising at a higher rate. This means that libraries with limited budgets are driven to give a greater share of their journal budgets to larger publishers; squeezing out smaller ones. The claim is that this is reducing choice.

While it is reducing choice amongst publishers, Andrew Odlyzko, in a recent paper, points out that “Big Deals” have also increased the number of journal titles available; not just in large libraries but across the board.


The reason is basically the same reason that is behind the drive towards open access — in electronic form, the marginal cost of an additional journal is zero and so it make sense to provide more journal titles to each library. Moreover, for smaller libraries, the average cost of a journal title has fallen at a faster rate than it has done for larger libraries. In other words, behind the spectre of increased publisher profits and market power, is an increase in journal availability. Put simply, more researchers have easier access to journals than before. This is one case where — if we just consider University libraries — price discrimination (using Varian’s rule) looks to be in the welfare improving range.

But there are, of course, wrinkles to all of this. This says nothing of access beyond Universities which is still an issue both economically and increasingly morally. It also says nothing of the distribution of rents in the industry. Publisher profits have increased dramatically and that money has to come from somewhere.

Odlyzko raises a new issue in that regard: publisher profits are a symptom that libraries are being squeezed. Of course, we know that the share of library budgets devoted to journal acquisition has risen. At the same time, library budgets have fallen although not as quickly as Odlyzko expected a decade ago. The reason is that libraries command attention at Universities. Changes to them are a signal of how quickly changes can occur within Universities. As it turns out, there is not very much. Libraries are centrally located, have nostalgic views in the eyes of alumni donors and hitting their budgets can often be read as a sign of a move against scholarship.

But what publishers are providing now, in terms of electronic access and search, is as much a transfer of functions as it is of money from libraries to themselves. Put simply, publishers are now doing what librarians used to do. They have provided tools that make it easier for people to find information. It is another way machines are being substituted for labor.

The competition between libraries and publishers has implications with regard to how we view alternative journal business models. Take, for instance, the notion that we can have journals funded by author fees and be given open access instead of being funded by user fees. If we did this, then this will just change the locus of the competitive fight between libraries and publishers to involve academics. Academics can legitimately argue that these new publication fees should come from the institution and, where will the institution find the money? In the now relieved library budgets as more journals go open access. So either way, the money for journal publishing will end up coming from libraries.

This is not to say that there is no scope for reducing the costs of journal access and storage. It is surely bloated now as it includes the publisher market power premium. The point is that libraries spent time resisting changes to journal business models as much as publishers did but that seems to have been a political error on their part.

This is all familiar stuff to economists. The flow of money is less important than the structure of activities. When it comes down to it, we know one thing: we can provide a journal system with labor from academics (as writers, referees and editors) and publisher activities when there is enough willingness to pay for all of it. That means we can provide the same overall payment and still, because journals are a non-rival good, have open access. In other words, there is no market impediment to open access, it is proven to be a pure Pareto improvement. The question now is how to do the “Really Big Deal” to get it there.