Support Us

You are browsing the archive for reputation.

Research Data Management in Economic Journals

- December 7, 2012 in Economic Publishing, EDaWaX, Featured, Open Data, Open Economics, Open Research

This blog post has been written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics

Research Data Management in Economic Journals


In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.

In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)

Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.

The project EDaWaX

Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:

First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?

As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.

Data Policies of Economic Journals

Download Dataset

The Sample

First of all, we wanted to know how many journals in Economics require their authors to provide their empirical analysis data. Of course it was not possible to analyze all of the estimated 8,000 to 10,000 journals in Economics.

We used a sample built by Bräuninger, Haucap and Muck (paper available in German only) for examining the relevance and reputation of economic journals in the eyes of German economists. This sample was very useful for our approach because it allowed the comparison of the international top journals to journals published in the German-speaking area. Using the sample’s rankings for relevance and reputation we could also establish that journals with data policies were also the ones with higher ranking.

In addition to the sample of Bräuninger, Haucap and Muck, we added four additional journals equipped with data availability policy to have more journals in our sample for a detailed evaluation of data policies. We excluded some journals because they are focused only on economic policy or theory and do not publish empirical articles.

The sample we used is not representative for economic journals, because it mainly consists of high-ranked journals. Furthermore, by adding some journals explicitly owning a data policy, the percentage of journals that is equipped with such guidelines also is much higher than we do expect for economic journals in general.

Journals owning a data availability policy

In our sample we have 29 journals equipped with a data availability policy (20.6%) and 11 journals (7.8%) owning a so called “replication policy” (we only examined the websites of the journals, not the printed versions). As mentioned above, this percentage is not representative for economic journals in general. In the contrary we assume that in our sample the majority of economic journals with data (availability) policies is included.

The number of journals with a data availability policy is considerably higher compared to earlier studies where other researchers (e.g. McCullough) examined the data archives of economic journals. An additional online-survey for editors of economic journals showed that most of our respondents implemented the data policies between 2004 and 2011. Therefore we suppose that the number of economic journals with data policies is slightly increasing. The editors of economic scholarly journals seem to realize that the topic of data availability is becoming more important.

The biggest portion of journals equipped with data availability policy were published by Wiley-Blackwell (6) and Elsevier (4). We found out that mainly university or association presses have high to very high percentage of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

The journals with a “replication policy” were excluded from further analysis. The reason is that “replication policies” are pledging authors to provide “sufficient data and other materials” on request only, so there are no files authors have to provide to the journal. This approach sounds good in theory – but it does not work in practice because authors often simply refuse to honor the requirements of these policies. (See “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” by Dewald, Thursby and Anderson).

Some criteria for data policies to enable replications

For a further evaluation of these data availability policies, we used some criteria for rating the quality of the policies: we extended some of the previously developed criteria by B.D. McCullough by adding standards which are important from an infrastructural point of view. The criteria we used for evaluation are as follows:

Data Policies that aim to ensure the replicability of economic research results have to:

  • be mandatory,
  • pledge authors to provide datasets, the code of computation, programs and descriptions of the data and variables (in form of a data dictionary at best),
  • assure that the data is provided prior to publication of an article,
  • have defined rules for research based on proprietary or confidential data,
  • provide the data, so other researchers can access these data without problems.

Besides journals should:

  • have a special section for the results of replication attempts or should at least publish results of replications in addition to the dataset(s),
  • require their authors to provide the data in open formats or in ASCII-format,
  • require their authors to specify the name and version of both the software and the operation system used for analysis.

Results of our survey

The above mentioned requirements have been used to analyze the data policies of 141 economic journals. These are some of the results we obtained:

Mandatory Data Availability Policies

We found out that more than 82% of the data policies are mandatory. This is a quite good percentage because for obtaining data it is crucial that policies mandate authors to do so. If they do not, there is little hope that authors provide a noteworthy amount of datasets and code – simply because it is time-consuming to prepare datasets and code and authors do not receive rewards for doing this work. Besides, authors often do not want to publish a dataset that is not fully exploited. In the academic struggle for reputation the opposite a researcher wants is to provide a substantial dataset to a competitor.

What data authors have to provide

We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus often is more oriented towards experimental economic research.

Regarding the question what kinds of data authors have to submit, we found out that 65.5% of the journals’ data policies require their authors to provide descriptions of the data submitted and some instructions on how to use the single files submitted. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be-replicators.

For the purpose of replication these descriptions of submitted data are very important due to the structure of the data authors are providing: In most cases, data is available as a zip-file only. In these zip-containers there is a broad bunch of different formats and files. Without proper documentation, it is extremely time-consuming to find out what part of the data corresponds to which results in the paper, if this is possible at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide descriptions. This kind of documentation is currently the most important part of metadata for describing the research data.

The submission of (self-written) programs used e.g. for simulation purposes is mandatory for 62% of the policies. This relatively low percentage can also be considered as problematic: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available.

Of course it depends on the journal’s focus, whether this kind of research is published. But if suchlike papers are published, a journal should take care that the programs used and the source code of the application are submitted. Only if the source code is available it is possible to check for inaccurate programming.

Approximately half of the policies mandate their authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. Without the code would-be replicators have to code everything from scratch. Whether these researchers will be able to compile an identical code of computation is uncertain. Therefore it is crucial that data availability policies enforce strict availability of the code of computation.

The point in time for providing datasets and other materials

Regarding the question at which point in time authors have to submit the data to the journal, we found out that almost 90% of the data availability policies pledge authors to provide their data prior to the publication of an article. This is a good percentage. It is important to obtain the data prior to publication, because the publication is -due to the lack of other rewards- the only incentive to submit data and code. If an article is published, this incentive is no longer given.

Exemptions from the data policy and the case of proprietary data

In economic research it is quite common to use proprietary datasets. Companies as Thomson Reuters Data Stream offer the possibility to acquire datasets and many researchers are choosing such options. Also research based on company data or microdata always is proprietary or even confidential.

Normally, if researchers want to publish an article based on these data, they have to request for an exemption from the the data policy. More than 72% of the journals we analyzed offered this possibility. One journal (Journal of the European Economic Association) discourages authors from publishing articles that rely on completely proprietary data.

But even if proprietary data was used for research, it is important that these research outputs are replicable in principle. Therefore journals should have a procedure in place that ensures the replicability of the results even in these cases. Consequently some journals request their authors to provide the code of computation, the version(s) of the dataset(s) and some additional information on how to obtain the dataset(s).

Of the 28 journals allowing exemptions from the data policy we found out that more than 60% possess rules for these cases. This is a percentage that is not really satisfactory. There is still room for improvements.

Open formats

Open formats are important for two reasons: The first is that the long-term preservation of these data is much easier, because the technical specifications of open formats are known. A second reason is that open formats offer the possibility to use data and code in different platforms and software environments. It is useful to have the possibility to utilize the data interoperably and not only in one statistical package or on one platform.

Regarding these topics only two journals made recommendations for open formats.

Version of software and OS

According to McCullough and Vinod (2003) the results achieved in economic research are often influenced by the statistical package that was used for calculations. Also the operating system has a bearing on the results. Therefore both the version of the software and the OS used for calculations should be specified.

Most of the data policies in our sample do not mandate their authors to provide these specifications. But there are differences: For example almost every journal that has adopted the data availability policy of the American Economic Review (AER) requires its authors to “document[…] the purpose and format of each file provided” for each file they submit to the journal.

In sharp contrast, up to now not a single policy requires the specification of the operating system used for calculations.

Replication rubric

In the course of our study we also examined whether journals have a special section for providing the results of replication attempts. We found out that only a very limited number of journals own a section for results of replications. In an additional online survey of the project EDaWaX 7 journals stated that they publish replication results or attempts in the journals. However the quantity of these replication attempts was low: None of the respondents published more than three replication studies per annum, most even less than one per year.

The need for a replication section mainly consists by controlling the quality of the data submitted. If a journal does not publish the results of replications authors may submit bad quality data.


In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is many more than other researchers found some years ago but compared to the multitude of economic journals in total the percentage of journals equipped with a data availability policy is still quite low. The 20.6% we found in our analyses might be the main proportion of all journals equipped with a data policy.

Nevertheless, editors and journals in economics seem to be in motion – the topic of data availability seems to become more and more important in economics. This is a positive signal and it will be an interesting aspect to monitor whether and how this upward trend continues.

A large portion of the analyzed data availability policies are mandatory, which is a good practice. Moreover, the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have appreciated the importance of providing data at an early stage in the publication process.

When analysing the data authors have to provide, we noticed that almost all guidelines mandate the submission of the (final) dataset(s), which is also quite positive.

But beyond that there is much room for improvements: Only two thirds of all policies require the submission of descriptions and of (self-written) software. As mentioned above, research data often is not usable, when descriptions or software components are missing. In particular the lack of requirements to submit the code of computation is a big problem for potential replication attempts. Only a small majority of all policies pledges their authors to provide it. Therefore it can be expected that almost half of the data availability policies in our sample is not fully enabling replications.

Another important aspect is the possibility to replicate the results of economic research that is based on proprietary or confidential data. While more than 72% of all policies allowing exemptions from their regulations, only 60.7% have a procedure in place that regulates data and descriptions which authors still have to provide in these cases. On balance, many research based on proprietary or confidential data is not replicable even in principle.

Open formats are used by a small minority of journals only. This might result in difficulties for the interoperable use of research data and the long-term preservation of these important sources of science and research.
The reuse of research data is also complicated by the lack of information on which version of a software was used for calculations. Only little more than a third of all policies discusses that authors have to specify the software version / the formats of submitted research data. Besides up to now, no single journal requires the specification of the operating system used.

But there are also good practices: Among the journals with data availability policies we noticed that the data availability policy implemented by the American Economic Review (AER) is a very good example of a data availability policy in economic journals. Journals equipped with this policy are the biggest single group of guidelines in our sample. Therefore we see a developing trend towards a de facto-standard for data policies.

In a second part to this survey (to be published in spring 2013) we will discuss the infrastructure used by economic scholarly journals for providing datasets and other materials.

This post has been added to the resources of the Open Economics Working Group.

Reputation Factor in Economic Publishing

- November 1, 2012 in Featured, Open Access


“The big problem in economics is that it really matters in which journals you publish, so the reputation factor is a big hindrance in getting open access journals up and going”. Can the accepted norms of scholarly publishing be successfully challenged?

This quotation is a line from the correspondence about writing this blogpost for the OKFN. The invitation came to write for the Open Economics Working Group, hence the focus on economics, but in reality the same situation pertains across pretty much any scholarly discipline you can mention. From the funding bodies down through faculty departments and academic librarians to individual researchers, an enormous worldwide system of research measurement has grown up that conflates the quality of research output with the publications in which it appears. Journals that receive a Thomson ISI ranking and high impact factors are perceived as the holy grail and, as is being witnessed currently in the UK during the Research Excellence Framework (REF) process, these carry tremendous weight when it comes to research fund awards.

Earlier this year, I attended a meeting with a Head of School at a Russell Group university, in response to an email that I had sent with information about Social Sciences Directory, the ‘gold’ open access publication that I was then in the first weeks of setting up. Buoyed by their acceptance to meet, I was optimistic that there would be interest and support for the idea of breaking the shackles of existing ranked journals and their subscription paywall barriers. I believed then – and still believe now – that if one or two senior university administrators had the courage to say, “We don’t care about the rankings. We will support alternative publishing solutions as a matter of principle”, then it would create a snowball effect and expedite the break up of the current monopolistic, archaic system. However, I was rapidly disabused. The faculty in the meeting listened politely and then stated categorically that they would never consider publishing in a start up venture such as Social Sciences Directory because of the requirements of the REF. The gist of it was, “We know subscription journals are restrictive and expensive, but that is what is required and we are not going to rock the boat”.

I left feeling deflated, though not entirely surprised. I realised some time ago that the notion of profit & loss, or cost control, or budgetary management, was simply anathema to many academic administrators and that trying to present an alternative model as a good thing because it is a better deal for taxpayers is an argument that is likely to founder on the rocks of the requirements of the funding and ranking systems, if not apathy and intransigence. A few years ago, whilst working as a sales manager in subscription publishing, I attended a conference of business school deans and directors. (This in itself was unusual, as most conferences that I attended were for librarians – ALA, UKSG, IFLA and the like – as the ‘customer’ in a subscription sense is usually the university library). During a breakout session, a game of one-upmanship began between three deans, as they waxed lyrically about the overseas campuses they were opening, the international exchanges of staff and students they had fixed up, the new campus buildings that were under construction, and so on.

Eventually, I asked the fairly reasonable question whether these costly ventures were being undertaken with a strategic view that they would eventually recoup their costs and were designed to help make their schools self-funding. Or indeed, whether education and research are of such importance for the greater good of all that they should be viewed as investments. The discomfort was palpable. One of the deans even strongly denied that this is a question of money. That the deans of business schools should take this view was an eye-opening insight in to the general academic attitude towards state funding. It is an attitude that is wrong because ultimately, of course, it is entirely about the money. The great irony was that this conversation took place in September 2008, with the collapse of Lehman Brothers and the full force of the Global Financial Crisis (GFC) soon to impact gravely on the global higher education and research sector. A system that for years had been awash with money had allowed all manner of poor practices to take effect, in which many different actors were complicit. Publishers had seized on the opportunity to expand output massively and charge vast fees for access; faculty had demanded that their libraries

subscribe to key journals, regardless of cost; libraries and consortia had agreed to publishers’ demands because they had the money to do so; and the funding bodies had built journal metrics into the measurement for future financing. No wonder, then, that neither academia nor publishers could or would take the great leap forward that is required to bring about change, even after the GFC had made it patently clear that the ongoing subscription model is ultimately unsustainable. Change needs to be imposed, as the British government bravely did in July with the decision to adopt the recommendations of the Finch Report.

However, this brings us back to the central issue and the quotation in the title. For now, the funding mechanisms are the same and the requirement to publish in journals with a reputation is still paramount. Until now, arguments against open access publishing have tended to focus on quality issues. The argument goes that the premier (subscription) journals take the best submissions and then there is a cascade downwards through second tier journals (which may or may not be subscription-based) until you get to a pile of leftover papers that can only be published by the author paying a fee to some sort of piratical publisher. This does not stand much scrutiny. Plenty of subscription-based journals are average and have been churned out by publishers looking to beef up their portfolios and justify charging ever-larger sums. Good research gets unnecessarily dumped by leading journals because they adhere to review policies dating from the print age when limited pagination forced them to be highly selective. Other academics, as we have seen at Social Sciences Directory, have chosen to publish and review beyond the established means because they believe in finding and helping alternatives. My point is that good research exists outside the ‘top’ journals. It is just a question of finding it.

So, after all this, do I believe that the “big hindrance” of reputation can be overcome? Yes, but only through planning and mandate. Here is what I believe should happen:

  1. The sheer number of journals is overwhelming and, in actuality, at odds with modern user behaviour which generally accesses content online and uses a keyword search to find information. Who needs journals? What you want is a large collection of articles that are well indexed and easily searchable, and freely available. This will enable the threads of inter-disciplinary research to spread much more effectively. It will increase usage and reduce cost-per-download (increasingly the metrics that librarians use to measure the return on investment of journals and databases), whilst helping to increase citation and impact.
  2. Ensure quality control of peer review by setting guidelines and adhering to them.
  3. De-couple the link between publishing and tenure & department funding.
  4. In many cases, universities will have subscribed to a particular journal for years and will therefore have access to a substantial back catalogue. This has often been supplemented by the purchase of digitised archives, as publishers cottoned on to other sources of revenue which happened to chime with librarians’ preferences to complete online collections and take advantage of non-repeatable purchases. Many publishers also sell their content to aggregators, who agree to an embargo period so that the publisher can also sell the most up-to-date research directly. Although the axe has fallen on many print subscriptions, some departments and individuals still prefer having a copy on their shelves (even though they could print off a PDF from the web version and have the same thing, minus the cover). So, aside from libraries often paying more than once for the same content, they will have complete collections up to a given point in time. University administrators need to take the bold decision to change, to pick an end date as a ‘cut off’ after which they will publicly state that they are switching to new policies in support of OA. This will allow funds to be freed up and used to pay for institutional memberships, article processing fees, institutional repositories – whatever the choice may be. Editors, authors and reviewers will be encouraged to offer their services elsewhere, which will in turn rapidly build the reputation of new publications.

Scholarly publishing is being subjected to a classic confrontation between tradition and modernity. For me, it is inevitable that modernity will win out and that the norms will be successfully challenged.