Support Us

You are browsing the archive for Economic Publishing.

Disclosure and ‘cook booking’

Joshua Gans - March 25, 2013 in Contribution Economy, Economic Publishing, Featured, Open Data, Open Economics

This blog post is cross-posted from the Contribution Economy Blog.


Many journals now have open data policies but they are sparingly enforced. So many scientists do not submit data. The question is: what drives them not to submit? Is it laziness? Is it a desire to keep the data to themselves? Or is it something more sinister? After all, the open data rules were, in part, to allow for replication experiments to ensure that the reported results were accurate.

Robert Trivers reports on an interesting study by Wicherts, Bakker, and Mlenar that correlates disclosure of data with the statistical strength of results in psychological journals.

Here is where they got a dramatic result. They limited their research to two of the four journals whose scientists were slightly more likely to share data and most of whose studies were similar in having an experimental design. This gave them 49 papers. Again, the majority failed to share any data, instead behaving as a parody of academics. Of those asked, 27 percent failed to respond to the request (or two follow-up reminders)—first, and best, line of self-defense, complete silence—25 percent promised to share data but had not done so after six years and 6 percent claimed the data were lost or there was no time to write a codebook. In short, 67 percent of (alleged) scientists avoided the first requirement of science—everything explicit and available for inspection by others.

Was there any bias in all this non-compliance? Of course there was. People whose results were closer to the fatal cut-off point of p=0.05 were less likely to share their data. Hand in hand, they were more likely to commit elementary statistical errors in their own favor. For example, for all seven papers where the correctly computed statistics rendered the findings non-significant (10 errors in all) none of the authors shared the data. This is consistent with earlier data showing that it took considerably longer for authors to respond to queries when the inconsistency in their reported results affected the significance of the results (where responses without data sharing!). Of a total of 1148 statistical tests in the 49 papers, 4 percent were incorrect based only on the scientists’ summary statistics and a full 96 percent of these mistakes were in the scientists’ favor. Authors would say that their results deserved a ‘one-tailed test’ (easier to achieve) but they had already set up a one-tailed test, so as they halved it, they created a ‘one-half tailed test’. Or they ran a one-tailed test without mentioning this even though a two-tailed test was the appropriate one. And so on. Separate work shows that only one-third of psychologists claim to have archived their data—the rest make reanalysis impossible almost at the outset! (I have 44 years of ‘archived’ lizard data—be my guest.) It is likely that similar practices are entwined with the widespread reluctance to share data in other “sciences” from sociology to medicine. Of course this statistical malfeasance is presumably only the tip of the iceberg, since in the undisclosed data and analysis one expects even more errors.

It’s correlation but it is troubling. The issue is that authors present results selectively and sadly this is not picked up in peer review processes. Of course, it goes without saying that even with open data, it takes effort to replicate and then publish alternative results and conclusions.

Looking again at “Big Deal” scholarly journal packages

Joshua Gans - February 18, 2013 in Contribution Economy, Economic Publishing, Featured, Open Access, Open Economics

This blog post is cross-posted from the Contribution Economy Blog.

One of the things pointed to in the debate over market power and scholarly journals is the rise of “Big Deal” packages. Basically, this has arisen as publishers bundle journals together for a single price. Indeed, as the publishers have merged and acquired more titles, these bundled packages have become more compelling with individual journal subscription pricing to libraries rising at a higher rate. This means that libraries with limited budgets are driven to give a greater share of their journal budgets to larger publishers; squeezing out smaller ones. The claim is that this is reducing choice.

While it is reducing choice amongst publishers, Andrew Odlyzko, in a recent paper, points out that “Big Deals” have also increased the number of journal titles available; not just in large libraries but across the board.

Serials

The reason is basically the same reason that is behind the drive towards open access — in electronic form, the marginal cost of an additional journal is zero and so it make sense to provide more journal titles to each library. Moreover, for smaller libraries, the average cost of a journal title has fallen at a faster rate than it has done for larger libraries. In other words, behind the spectre of increased publisher profits and market power, is an increase in journal availability. Put simply, more researchers have easier access to journals than before. This is one case where — if we just consider University libraries — price discrimination (using Varian’s rule) looks to be in the welfare improving range.

But there are, of course, wrinkles to all of this. This says nothing of access beyond Universities which is still an issue both economically and increasingly morally. It also says nothing of the distribution of rents in the industry. Publisher profits have increased dramatically and that money has to come from somewhere.

Odlyzko raises a new issue in that regard: publisher profits are a symptom that libraries are being squeezed. Of course, we know that the share of library budgets devoted to journal acquisition has risen. At the same time, library budgets have fallen although not as quickly as Odlyzko expected a decade ago. The reason is that libraries command attention at Universities. Changes to them are a signal of how quickly changes can occur within Universities. As it turns out, there is not very much. Libraries are centrally located, have nostalgic views in the eyes of alumni donors and hitting their budgets can often be read as a sign of a move against scholarship.

But what publishers are providing now, in terms of electronic access and search, is as much a transfer of functions as it is of money from libraries to themselves. Put simply, publishers are now doing what librarians used to do. They have provided tools that make it easier for people to find information. It is another way machines are being substituted for labor.

The competition between libraries and publishers has implications with regard to how we view alternative journal business models. Take, for instance, the notion that we can have journals funded by author fees and be given open access instead of being funded by user fees. If we did this, then this will just change the locus of the competitive fight between libraries and publishers to involve academics. Academics can legitimately argue that these new publication fees should come from the institution and, where will the institution find the money? In the now relieved library budgets as more journals go open access. So either way, the money for journal publishing will end up coming from libraries.

This is not to say that there is no scope for reducing the costs of journal access and storage. It is surely bloated now as it includes the publisher market power premium. The point is that libraries spent time resisting changes to journal business models as much as publishers did but that seems to have been a political error on their part.

This is all familiar stuff to economists. The flow of money is less important than the structure of activities. When it comes down to it, we know one thing: we can provide a journal system with labor from academics (as writers, referees and editors) and publisher activities when there is enough willingness to pay for all of it. That means we can provide the same overall payment and still, because journals are a non-rival good, have open access. In other words, there is no market impediment to open access, it is proven to be a pure Pareto improvement. The question now is how to do the “Really Big Deal” to get it there.

Dutch PhD-workshop on research design, open access and open data

Velichka Dimitrova - February 1, 2013 in Economic Publishing, EDaWaX, External Projects, Featured, Open Access, Open Data, Open Economics

This blog post is written by Esther Hoorn, Copyright Librarian, University of Groningen, the Netherlands.

If Roald Dahl were still alive, he would certainly be tempted to write a book about the Dutch social psychologist Diederik Stapel. For not only did he make up the research data to support his conclusions, but also he ate all the M&M’s, which he bought with public money for interviews with fictitious pupils in fictitious high schools. In the Netherlands the research fraud by Stapel was a catalyst to bring attention to the issue of research integrity and availability of research data. A new generation of researchers needs to be aware of the policy on sharing research data by the Dutch research funder NWO, the EU policy and the services of DANS, the Dutch Data archiving and networked services. In the near future, a data management plan will be required in every research proposal.

Verifiability

For some time now the library at the University of Groningen is organizing workshops for PhDs to raise awareness on the shift towards Open Access. Open Access and copyright are the main themes. The question also to address verifiability of research data came from SOM, the Research Institute of the Faculty of Economics and Business. The workshop is given as part of the course Research Design of the PhD program. The blogpost Research data management in economic journals proved to be very useful to get an overview of the related issues in this field.

Open Access

As we often see, Open Access was a new issue to most of the students. Because the library buys licenses the students don’t perceive a problem with access to research journals. Moreover, they are not aware of the big sums that the universities at present pay to finance access exclusively for their own staff and students. Once they understand the issue there is a strong interest. Some see a parallel with innovative distribution models for music. The PhDs come from all over the world. And more and more Open Access is addressed in every country of the world. One PhD from Indonesia mentioned that the Indonesian government requires his dissertation to be available through the national Open Access repository. Chinese students were surprised by availability of information on Open Access in China.

Assignment

The students prepared an assignment with some questions on Open Access and sharing research data. The first question still is on the impact factor of the journals in which they intend to publish. The questions brought the discussion to article level metrics and alternative ways to organize the peer review of Open Access journals.

Will availability of research data stimulate open access?

Example of the Open Access journal Economics

The blogpost Research data management in economic journals presents the results of the German project EdaWax, European Data Watch Extended. An important result of the survey points at the role of association and university presses. Especially it appears that many journals followed the data availability policy of the American Economic Association.

[quote] We found out that mainly university or association presses have high to very high percentages of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

For students it is assuring to see how associations take up their role to address this issue. An example of an Open Access journal that adopted the AER policy is Economics. And yes, this journal does have an impact factor in the Social Science Citation Index and also the possibility to archive the datasets in the Dataverse Network.

Re-use of research data for peer review

One of the students suggested that the public availability of research data (instead or merely research findings) may lead to innovative forms of review. This may facilitate a further shift towards Open Access. With access to underlying research data and methodologies used, scientists may be in a better position to evaluate the quality of the research conducted by peers. The typical quality label given by top and very good journals may then become less relevant, over time.
It was also discussed that journals may not publish a certain numbers of papers in a volume released e.g. four times a year, but rather as qualifying papers are available for publication throughout the year. Another point raised was that a substantial change in the existing publication mechanics will likely require either top journals or top business schools to lead the way, whereas associations of leading scientists in a certain field may also play an important role in such conversion.

First Open Economics International Workshop Recap

Velichka Dimitrova - January 25, 2013 in Economic Publishing, Events, Featured, Open Access, Open Data, Open Economics, Open Research, Open Tools, Workshop

The first Open Economics International Workshop gathered 40 academic economists, data publishers and funders of economics research, researchers and practitioners to a two-day event at Emmanuel College in Cambridge, UK. The aim of the workshop was to build an understanding around the value of open data and open tools for the Economics profession and the obstacles to opening up information, as well as the role of greater openness of the academy. This event was organised by the Open Knowledge Foundation and the Centre for Intellectual Property and Information Law and was supported by the Alfred P. Sloan Foundation. Audio and slides are available at the event’s webpage.

Open Economics Workshop

Setting the Scene

The Setting the Scene session was about giving a bit of context to “Open Economics” in the knowledge society, seeing also examples from outside of the discipline and discussing reproducible research. Rufus Pollock (Open Knowledge Foundation) emphasised that there is necessary change and substantial potential for economics: 1) open “core” economic data outside the academy, 2) open as default for data in the academy, 3) a real growth in citizen economics and outside participation. Daniel Goroff (Alfred P. Sloan Foundation) drew attention to the work of the Alfred P. Sloan Foundation in emphasising the importance of knowledge and its use for making decisions and data and knowledge as a non-rival, non-excludable public good. Tim Hubbard (Wellcome Trust Sanger Institute) spoke about the potential of large-scale data collection around individuals for improving healthcare and how centralised global repositories work in the field of bioinformatics. Victoria Stodden (Columbia University / RunMyCode) stressed the importance of reproducibility for economic research and as an essential part of scientific methodology and presented the RunMyCode project.

Open Data in Economics

The Open Data in Economics session was chaired by Christian Zimmermann (Federal Reserve Bank of St. Louis / RePEc) and was about several projects and ideas from various institutions. The session examined examples of open data in Economics and sought to discover whether these examples are sustainable and can be implemented in other contexts: whether the right incentives exist. Paul David (Stanford University / SIEPR) characterised the open science system as a system which is better than any other in the rapid accumulation of reliable knowledge, whereas the proprietary systems are very good in extracting the rent from the existing knowledge. A balance between these two systems should be established so that they can work within the same organisational system since separately they are distinctly suboptimal. Johannes Kiess (World Bank) underlined that having the data available is often not enough: “It is really important to teach people how to understand these datasets: data journalists, NGOs, citizens, coders, etc.”. The World Bank has implemented projects to incentivise the use of the data and is helping countries to open up their data. For economists, he mentioned, having a valuable dataset to publish on is an important asset, there are therefore not sufficient incentives for sharing.

Eustáquio J. Reis (Institute of Applied Economic Research – Ipea) related his experience on establishing the Ipea statistical database and other projects for historical data series and data digitalisation in Brazil. He shared that the culture of the economics community is not a culture of collaboration where people willingly share or support and encourage data curation. Sven Vlaeminck (ZBW – Leibniz Information Centre for Economics) spoke about the EDaWaX project which conducted a study of the data-availability of economics journals and will establish publication-related data archive for an economics journal in Germany.

Legal, Cultural and other Barriers to Information Sharing in Economics

The session presented different impediments to the disclosure of data in economics from the perspective of two lawyers and two economists. Lionel Bently (University of Cambridge / CIPIL) drew attention to the fact that there is a whole range of different legal mechanism which operate to restrict the dissemination of information, yet on the other hand there is also a range of mechanism which help to make information available. Lionel questioned whether the open data standard would be always the optimal way to produce high quality economic research or whether there is also a place for modulated/intermediate positions where data is available only on conditions, or only in certain part or for certain forms of use. Mireille van Eechoud (Institute for Information Law) described the EU Public Sector Information Directive – the most generic document related to open government data and progress made for opening up information published by the government. Mireille also pointed out that legal norms have only limited value if you don’t have the internalised, cultural attitudes and structures in place that really make more access to information work.

David Newbery (University of Cambridge) presented an example from the electricity markets and insisted that for a good supply of data, informed demand is needed, coming from regulators who are charged to monitor markets, detect abuse, uphold fair competition and defend consumers. John Rust (Georgetown University) said that the government is an important provider of data which is otherwise too costly to collect, yet a number of issues exist including confidentiality, excessive bureaucratic caution and the public finance crisis. There are a lot of opportunities for research also in the private sector where some part of the data can be made available (redacting confidential information) and the public non-profit sector also can have a tremendous role as force to organise markets for the better, set standards and focus of targeted domains.

Current Data Deposits and Releases – Mandating Open Data?

The session was chaired by Daniel Goroff (Alfred P. Sloan Foundation) and brought together funders and publishers to discuss their role in requiring data from economic research to be publicly available and the importance of dissemination for publishing.

Albert Bravo-Biosca (NESTA) emphasised that mandating open data begins much earlier in the process where funders can encourage the collection of particular data by the government which is the basis for research and can also act as an intermediary for the release of open data by the private sector. Open data is interesting but it is even more interesting when it is appropriately linked and combined with other data and the there is a value in examples and case studies for demonstrating benefits. There should be however caution as opening up some data might result in less data being collected.

Toby Green (OECD Publishing) made a point of the different between posting and publishing, where making content available does not always mean that it would be accessible, discoverable, usable and understandable. In his view, the challenge is to build up an audience by putting content where people would find it, which is very costly as proper dissemination is expensive. Nancy Lutz (National Science Foundation) explained the scope and workings of the NSF and the data management plans required from all economists who are applying for funding. Creating and maintaining data infrastructure and compliance with the data management policy might eventually mean that there would be less funding for other economic research.

Trends of Greater Participation and Growing Horizons in Economics

Chris Taggart (OpenCorporates) chaired the session which introduced different ways of participating and using data, different audiences and contributors. He stressed that data is being collected in new ways and by different communities, that access to data can be an enormous privilege and can generate data gravities with very unequal access and power to make use of and to generate more data and sometimes analysis is being done in new and unexpected ways and by unexpected contributors. Michael McDonald (George Mason University) related how the highly politicised process of drawing up district lines in the U.S. (also called Gerrymandering) could be done in a much more transparent way through an open-source re-districting process with meaningful participation allowing for an open conversation about public policy. Michael also underlined the importance of common data formats and told a cautionary tale about a group of academics misusing open data with a political agenda to encourage a storyline that a candidate would win a particular state.

Hans-Peter Brunner (Asian Development Bank) shared a vision about how open data and open analysis can aid in decision-making about investments in infrastructure, connectivity and policy. Simulated models about investments can demonstrate different scenarios according to investment priorities and crowd-sourced ideas. Hans-Peter asked for feedback and input on how to make data and code available. Perry Walker (new economics foundation) spoke about the conversation and that a good conversation has to be designed as it usually doesn’t happen by accident. Rufus Pollock (Open Knowledge Foundation) concluded with examples about citizen economics and the growth of contributions from the wider public, particularly through volunteering computing and volunteer thinking as a way of getting engaged in research.

During two sessions, the workshop participants also worked on Statement on the Open Economics principles will be revised with further input from the community and will be made public on the second Open Economics workshop taking place on 11-12 June in Cambridge, MA.

Research Data Management in Economic Journals

Sven Vlaeminck - December 7, 2012 in Economic Publishing, EDaWaX, Featured, Open Data, Open Economics, Open Research

This blog post has been written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics

Research Data Management in Economic Journals

Background

In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.

In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)

Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.

The project EDaWaX

Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:

First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?

As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.

Data Policies of Economic Journals

Download Dataset

The Sample

First of all, we wanted to know how many journals in Economics require their authors to provide their empirical analysis data. Of course it was not possible to analyze all of the estimated 8,000 to 10,000 journals in Economics.

We used a sample built by Bräuninger, Haucap and Muck (paper available in German only) for examining the relevance and reputation of economic journals in the eyes of German economists. This sample was very useful for our approach because it allowed the comparison of the international top journals to journals published in the German-speaking area. Using the sample’s rankings for relevance and reputation we could also establish that journals with data policies were also the ones with higher ranking.

In addition to the sample of Bräuninger, Haucap and Muck, we added four additional journals equipped with data availability policy to have more journals in our sample for a detailed evaluation of data policies. We excluded some journals because they are focused only on economic policy or theory and do not publish empirical articles.

The sample we used is not representative for economic journals, because it mainly consists of high-ranked journals. Furthermore, by adding some journals explicitly owning a data policy, the percentage of journals that is equipped with such guidelines also is much higher than we do expect for economic journals in general.

Journals owning a data availability policy

In our sample we have 29 journals equipped with a data availability policy (20.6%) and 11 journals (7.8%) owning a so called “replication policy” (we only examined the websites of the journals, not the printed versions). As mentioned above, this percentage is not representative for economic journals in general. In the contrary we assume that in our sample the majority of economic journals with data (availability) policies is included.

The number of journals with a data availability policy is considerably higher compared to earlier studies where other researchers (e.g. McCullough) examined the data archives of economic journals. An additional online-survey for editors of economic journals showed that most of our respondents implemented the data policies between 2004 and 2011. Therefore we suppose that the number of economic journals with data policies is slightly increasing. The editors of economic scholarly journals seem to realize that the topic of data availability is becoming more important.

The biggest portion of journals equipped with data availability policy were published by Wiley-Blackwell (6) and Elsevier (4). We found out that mainly university or association presses have high to very high percentage of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

The journals with a “replication policy” were excluded from further analysis. The reason is that “replication policies” are pledging authors to provide “sufficient data and other materials” on request only, so there are no files authors have to provide to the journal. This approach sounds good in theory – but it does not work in practice because authors often simply refuse to honor the requirements of these policies. (See “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” by Dewald, Thursby and Anderson).

Some criteria for data policies to enable replications

For a further evaluation of these data availability policies, we used some criteria for rating the quality of the policies: we extended some of the previously developed criteria by B.D. McCullough by adding standards which are important from an infrastructural point of view. The criteria we used for evaluation are as follows:

Data Policies that aim to ensure the replicability of economic research results have to:

  • be mandatory,
  • pledge authors to provide datasets, the code of computation, programs and descriptions of the data and variables (in form of a data dictionary at best),
  • assure that the data is provided prior to publication of an article,
  • have defined rules for research based on proprietary or confidential data,
  • provide the data, so other researchers can access these data without problems.

Besides journals should:

  • have a special section for the results of replication attempts or should at least publish results of replications in addition to the dataset(s),
  • require their authors to provide the data in open formats or in ASCII-format,
  • require their authors to specify the name and version of both the software and the operation system used for analysis.

Results of our survey

The above mentioned requirements have been used to analyze the data policies of 141 economic journals. These are some of the results we obtained:

Mandatory Data Availability Policies

We found out that more than 82% of the data policies are mandatory. This is a quite good percentage because for obtaining data it is crucial that policies mandate authors to do so. If they do not, there is little hope that authors provide a noteworthy amount of datasets and code – simply because it is time-consuming to prepare datasets and code and authors do not receive rewards for doing this work. Besides, authors often do not want to publish a dataset that is not fully exploited. In the academic struggle for reputation the opposite a researcher wants is to provide a substantial dataset to a competitor.

What data authors have to provide

We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus often is more oriented towards experimental economic research.

Regarding the question what kinds of data authors have to submit, we found out that 65.5% of the journals’ data policies require their authors to provide descriptions of the data submitted and some instructions on how to use the single files submitted. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be-replicators.

For the purpose of replication these descriptions of submitted data are very important due to the structure of the data authors are providing: In most cases, data is available as a zip-file only. In these zip-containers there is a broad bunch of different formats and files. Without proper documentation, it is extremely time-consuming to find out what part of the data corresponds to which results in the paper, if this is possible at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide descriptions. This kind of documentation is currently the most important part of metadata for describing the research data.

The submission of (self-written) programs used e.g. for simulation purposes is mandatory for 62% of the policies. This relatively low percentage can also be considered as problematic: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available.

Of course it depends on the journal’s focus, whether this kind of research is published. But if suchlike papers are published, a journal should take care that the programs used and the source code of the application are submitted. Only if the source code is available it is possible to check for inaccurate programming.

Approximately half of the policies mandate their authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. Without the code would-be replicators have to code everything from scratch. Whether these researchers will be able to compile an identical code of computation is uncertain. Therefore it is crucial that data availability policies enforce strict availability of the code of computation.

The point in time for providing datasets and other materials

Regarding the question at which point in time authors have to submit the data to the journal, we found out that almost 90% of the data availability policies pledge authors to provide their data prior to the publication of an article. This is a good percentage. It is important to obtain the data prior to publication, because the publication is -due to the lack of other rewards- the only incentive to submit data and code. If an article is published, this incentive is no longer given.

Exemptions from the data policy and the case of proprietary data

In economic research it is quite common to use proprietary datasets. Companies as Thomson Reuters Data Stream offer the possibility to acquire datasets and many researchers are choosing such options. Also research based on company data or microdata always is proprietary or even confidential.

Normally, if researchers want to publish an article based on these data, they have to request for an exemption from the the data policy. More than 72% of the journals we analyzed offered this possibility. One journal (Journal of the European Economic Association) discourages authors from publishing articles that rely on completely proprietary data.

But even if proprietary data was used for research, it is important that these research outputs are replicable in principle. Therefore journals should have a procedure in place that ensures the replicability of the results even in these cases. Consequently some journals request their authors to provide the code of computation, the version(s) of the dataset(s) and some additional information on how to obtain the dataset(s).

Of the 28 journals allowing exemptions from the data policy we found out that more than 60% possess rules for these cases. This is a percentage that is not really satisfactory. There is still room for improvements.

Open formats

Open formats are important for two reasons: The first is that the long-term preservation of these data is much easier, because the technical specifications of open formats are known. A second reason is that open formats offer the possibility to use data and code in different platforms and software environments. It is useful to have the possibility to utilize the data interoperably and not only in one statistical package or on one platform.

Regarding these topics only two journals made recommendations for open formats.

Version of software and OS

According to McCullough and Vinod (2003) the results achieved in economic research are often influenced by the statistical package that was used for calculations. Also the operating system has a bearing on the results. Therefore both the version of the software and the OS used for calculations should be specified.

Most of the data policies in our sample do not mandate their authors to provide these specifications. But there are differences: For example almost every journal that has adopted the data availability policy of the American Economic Review (AER) requires its authors to “document[…] the purpose and format of each file provided” for each file they submit to the journal.

In sharp contrast, up to now not a single policy requires the specification of the operating system used for calculations.

Replication rubric

In the course of our study we also examined whether journals have a special section for providing the results of replication attempts. We found out that only a very limited number of journals own a section for results of replications. In an additional online survey of the project EDaWaX 7 journals stated that they publish replication results or attempts in the journals. However the quantity of these replication attempts was low: None of the respondents published more than three replication studies per annum, most even less than one per year.

The need for a replication section mainly consists by controlling the quality of the data submitted. If a journal does not publish the results of replications authors may submit bad quality data.

Conclusion

In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is many more than other researchers found some years ago but compared to the multitude of economic journals in total the percentage of journals equipped with a data availability policy is still quite low. The 20.6% we found in our analyses might be the main proportion of all journals equipped with a data policy.

Nevertheless, editors and journals in economics seem to be in motion – the topic of data availability seems to become more and more important in economics. This is a positive signal and it will be an interesting aspect to monitor whether and how this upward trend continues.

A large portion of the analyzed data availability policies are mandatory, which is a good practice. Moreover, the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have appreciated the importance of providing data at an early stage in the publication process.

When analysing the data authors have to provide, we noticed that almost all guidelines mandate the submission of the (final) dataset(s), which is also quite positive.

But beyond that there is much room for improvements: Only two thirds of all policies require the submission of descriptions and of (self-written) software. As mentioned above, research data often is not usable, when descriptions or software components are missing. In particular the lack of requirements to submit the code of computation is a big problem for potential replication attempts. Only a small majority of all policies pledges their authors to provide it. Therefore it can be expected that almost half of the data availability policies in our sample is not fully enabling replications.

Another important aspect is the possibility to replicate the results of economic research that is based on proprietary or confidential data. While more than 72% of all policies allowing exemptions from their regulations, only 60.7% have a procedure in place that regulates data and descriptions which authors still have to provide in these cases. On balance, many research based on proprietary or confidential data is not replicable even in principle.

Open formats are used by a small minority of journals only. This might result in difficulties for the interoperable use of research data and the long-term preservation of these important sources of science and research.
The reuse of research data is also complicated by the lack of information on which version of a software was used for calculations. Only little more than a third of all policies discusses that authors have to specify the software version / the formats of submitted research data. Besides up to now, no single journal requires the specification of the operating system used.

But there are also good practices: Among the journals with data availability policies we noticed that the data availability policy implemented by the American Economic Review (AER) is a very good example of a data availability policy in economic journals. Journals equipped with this policy are the biggest single group of guidelines in our sample. Therefore we see a developing trend towards a de facto-standard for data policies.

In a second part to this survey (to be published in spring 2013) we will discuss the infrastructure used by economic scholarly journals for providing datasets and other materials.

This post has been added to the resources of the Open Economics Working Group.

Review of Open Access in Economics

Ross Mounce - October 26, 2012 in Economic Publishing, Featured, Open Access

Ever since BioMed Central (BMC) published its first free online article on July 19th 2000, the Open Access movement has made significant progress, so much so that many different stakeholders now see 100% Open Access to research as inevitable in the near future. Some are already extrapolating from recent growth trends that Open Access will take 90% of the overall article share by just 2020 (Lewis, 2012). Another recent analysis shows that during 2011 the number of Open Access articles published was ~340,000 spread over ~6,700 different journals which is about 17% of the overall literature space (1.66 million articles) for that year (Laakso & Bjork, 2012).

Perhaps because of the more obvious lifesaving benefits, biomedical research in particular has seen the largest growth in Open Access – patients & doctors alike can gain truly lifesaving benefit from easy, cost-free, Open Access to research. Those very same doctors and patients may have difficulty accessing the latest toll access-only research; any delay or impediment to accessing up-to-date medical knowledge can have negative, even fatal consequences:

[The following is from ‘The impact of open access upon public health. PLoS Medicine (2006) 3:e252+‘ illustrating how barriers to knowledge access have grave consequences]

Arthur Amman, President of Global Strategies for HIV Prevention, tells this story: “I recently met a physician from southern Africa, engaged in perinatal HIV prevention, whose primary access to information was abstracts posted on the Internet. Based on a single abstract, they had altered their perinatal HIV prevention program from an effective therapy to one with lesser efficacy. Had they read the full text article they would have undoubtedly realized that the study results were based on short-term follow-up, a small pivotal group, incomplete data, and unlikely to be applicable to their country situation. Their decision to alter treatment based solely on the abstract’s conclusions may have resulted in increased perinatal HIV transmission”

But there are also significant benefits to be gained from Open Access to other, non-biomedical research. Open Access to social science & humanities research is also increasing, and has recently been mandated by Research Councils UK (RCUK), the UK agency that dictates policy for all publicly-funded academic research in the UK, on the basis of the Finch report [PDF]. Particularly with respect to economics, I find it extremely worrying that our MPs and policymakers often do NOT have access to the latest academic economic research. David Willetts MP, recently admitted he couldn’t access some research on a BBC Radio 3 interview recently. Likewise at the Open Knowledge Festival in Helsinki recently, a policymaker expressed frustration at his inability to access possible policy-influencing evidence as published in academic journals.

So, for this blogpost, I set about seeing what the Open Access publishing options are for economists. I am well-versed in the OA options for scientists and have produced a visualization of various different paid Gold Open Access options here which has garnered much interest and attention. Even for scientists there are a wealth of completely free-to-publish-in options that are also Open Access (free-to-read, no subscription or payment required).

As far I can see, the Gold Open Access ‘scene’ in Economics is less well-developed relative to the sciences. The Directory of Open Access Journals (DOAJ) lists 192 separate immediate Open Access journals of varying quality (compared to over 500 medical journals listed in DOAJ). These OA economics journals also seem to be newer on average than the similar spread of OA biomedical journals. Nevertheless I found what appear to be some excellent OA economics journals including:

  • Economic Analysis and Policy  – a journal of the Economic Society of Australia, seems to take great pride and interest in Open Access: there’s a whole issue devoted to the subject of Open Access in Economics with papers by names even I recognise e.g. Christian Zimmermann & John Willinsky.
  • Theoretical Economics – published by the Econometrics Society three times a year. Authors retain the copyright to their works, and these are published under a standard Creative Commons licence (CC BY-NC). The PDFs seem very high-quality to me and contain an abundance of clickable hyperlinks & URLs – an added-value service I don’t even see from many good subscription publishers! Publishing here only requires one of the authors to be a member of the society which only costs £50 a year, with fee reductions for students. Given many OA science publications cost >£1000 per publication I find this price extremely reasonable.
  • Monthly Labor Review – published by the US Department of Labor, and in existence since 1915(!) this seems to me to be another high-quality, highly-read Open Access journal.
  • Economics – published in Germany under a Creative Commons Licence (CC BY-NC). It has an excellent, modern and clear website, great (high-standard) data availability policy and even occasionally awards prizes for the best papers published in the journal.
  • Journal of Economic and Social Policy – another Australian journal, established in the year 2000, providing a simple but completely free outlet for publishing on social and economic issues, reviewing conceptual problems, or debating policy initiatives.
  • …and many more. Just like with science OA journals there are numerous journals of local interest e.g. Latin American journals: Revista Brasileira de Economia, the Latin American Journal of Economics, Revista de Economia Contemporânea, Revista de Análisis Económico. European journals like the South-Eastern Europe Journal of Economics (SEEJE) and Ekonomska Istrazivanja (Croatian) and Asian journals e.g. Kasarinlan (Philippine Journal of Third World Studies). These should not be dismissed or discounted, not everything is appropriate for ‘international-scope’ journals. Local journals are important for publishing smaller scale research which can be built-upon by comparative studies and/or meta-analyses.
 It’s International Open Access Week this week: 22 – 28 October 2012

 

Perhaps more interesting with respect to Open Access in Economics is the thriving Green Open Access scene. In the sciences Green Open Access is pretty limited in my opinion. arXiv has popularised Green OA in certain areas of physics & maths but in my particular domain (Biology) Green OA is a deeply unpopular and unused method of providing OA. From what I have seen OA initiatives in Economics such as RePEc (Research Papers in Economics) and EconStor seem to be extremely popular and successful. As I understand it RePEc provides Open Bibliographic Data for an impressive volume of Economics articles, in this respect the field is far ahead of the sciences – there is little free or open bibliographic data from most science publishers. EconStor is an OA repository of the German National Library of Economics – Leibniz Information Centre for Economics. It contains more than 48,000 OA works which is a fiercely impressive volume. The search functions are perhaps a tad basic, but with that much OA literature collected and available for use I’ve no doubt someone will create a better, more powerful search interface for the collection.

In summary, from my casual glance at OA publishing in Economics as a non-economist, mea culpa, things look very positive here. Unless informed otherwise I think the OA scene here too is likely to grow and dominate the academic publishing space as it is in other areas of academia.

References

Laakso, M. and Bjork, B. C. 2012. Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Medicine 10:124+

Lewis, D. W. 2012. The inevitability of open access. College & Research Libraries 73:493-506.