Support Us

You are browsing the archive for economic publishing.

Research Data Management in Economic Journals

- December 7, 2012 in Economic Publishing, EDaWaX, Featured, Open Data, Open Economics, Open Research

This blog post has been written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics

Research Data Management in Economic Journals

Background

In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.

In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)

Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.

The project EDaWaX

Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:

First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?

As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.

Data Policies of Economic Journals

Download Dataset

The Sample

First of all, we wanted to know how many journals in Economics require their authors to provide their empirical analysis data. Of course it was not possible to analyze all of the estimated 8,000 to 10,000 journals in Economics.

We used a sample built by Bräuninger, Haucap and Muck (paper available in German only) for examining the relevance and reputation of economic journals in the eyes of German economists. This sample was very useful for our approach because it allowed the comparison of the international top journals to journals published in the German-speaking area. Using the sample’s rankings for relevance and reputation we could also establish that journals with data policies were also the ones with higher ranking.

In addition to the sample of Bräuninger, Haucap and Muck, we added four additional journals equipped with data availability policy to have more journals in our sample for a detailed evaluation of data policies. We excluded some journals because they are focused only on economic policy or theory and do not publish empirical articles.

The sample we used is not representative for economic journals, because it mainly consists of high-ranked journals. Furthermore, by adding some journals explicitly owning a data policy, the percentage of journals that is equipped with such guidelines also is much higher than we do expect for economic journals in general.

Journals owning a data availability policy

In our sample we have 29 journals equipped with a data availability policy (20.6%) and 11 journals (7.8%) owning a so called “replication policy” (we only examined the websites of the journals, not the printed versions). As mentioned above, this percentage is not representative for economic journals in general. In the contrary we assume that in our sample the majority of economic journals with data (availability) policies is included.

The number of journals with a data availability policy is considerably higher compared to earlier studies where other researchers (e.g. McCullough) examined the data archives of economic journals. An additional online-survey for editors of economic journals showed that most of our respondents implemented the data policies between 2004 and 2011. Therefore we suppose that the number of economic journals with data policies is slightly increasing. The editors of economic scholarly journals seem to realize that the topic of data availability is becoming more important.

The biggest portion of journals equipped with data availability policy were published by Wiley-Blackwell (6) and Elsevier (4). We found out that mainly university or association presses have high to very high percentage of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

The journals with a “replication policy” were excluded from further analysis. The reason is that “replication policies” are pledging authors to provide “sufficient data and other materials” on request only, so there are no files authors have to provide to the journal. This approach sounds good in theory – but it does not work in practice because authors often simply refuse to honor the requirements of these policies. (See “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” by Dewald, Thursby and Anderson).

Some criteria for data policies to enable replications

For a further evaluation of these data availability policies, we used some criteria for rating the quality of the policies: we extended some of the previously developed criteria by B.D. McCullough by adding standards which are important from an infrastructural point of view. The criteria we used for evaluation are as follows:

Data Policies that aim to ensure the replicability of economic research results have to:

  • be mandatory,
  • pledge authors to provide datasets, the code of computation, programs and descriptions of the data and variables (in form of a data dictionary at best),
  • assure that the data is provided prior to publication of an article,
  • have defined rules for research based on proprietary or confidential data,
  • provide the data, so other researchers can access these data without problems.

Besides journals should:

  • have a special section for the results of replication attempts or should at least publish results of replications in addition to the dataset(s),
  • require their authors to provide the data in open formats or in ASCII-format,
  • require their authors to specify the name and version of both the software and the operation system used for analysis.

Results of our survey

The above mentioned requirements have been used to analyze the data policies of 141 economic journals. These are some of the results we obtained:

Mandatory Data Availability Policies

We found out that more than 82% of the data policies are mandatory. This is a quite good percentage because for obtaining data it is crucial that policies mandate authors to do so. If they do not, there is little hope that authors provide a noteworthy amount of datasets and code – simply because it is time-consuming to prepare datasets and code and authors do not receive rewards for doing this work. Besides, authors often do not want to publish a dataset that is not fully exploited. In the academic struggle for reputation the opposite a researcher wants is to provide a substantial dataset to a competitor.

What data authors have to provide

We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus often is more oriented towards experimental economic research.

Regarding the question what kinds of data authors have to submit, we found out that 65.5% of the journals’ data policies require their authors to provide descriptions of the data submitted and some instructions on how to use the single files submitted. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be-replicators.

For the purpose of replication these descriptions of submitted data are very important due to the structure of the data authors are providing: In most cases, data is available as a zip-file only. In these zip-containers there is a broad bunch of different formats and files. Without proper documentation, it is extremely time-consuming to find out what part of the data corresponds to which results in the paper, if this is possible at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide descriptions. This kind of documentation is currently the most important part of metadata for describing the research data.

The submission of (self-written) programs used e.g. for simulation purposes is mandatory for 62% of the policies. This relatively low percentage can also be considered as problematic: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available.

Of course it depends on the journal’s focus, whether this kind of research is published. But if suchlike papers are published, a journal should take care that the programs used and the source code of the application are submitted. Only if the source code is available it is possible to check for inaccurate programming.

Approximately half of the policies mandate their authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. Without the code would-be replicators have to code everything from scratch. Whether these researchers will be able to compile an identical code of computation is uncertain. Therefore it is crucial that data availability policies enforce strict availability of the code of computation.

The point in time for providing datasets and other materials

Regarding the question at which point in time authors have to submit the data to the journal, we found out that almost 90% of the data availability policies pledge authors to provide their data prior to the publication of an article. This is a good percentage. It is important to obtain the data prior to publication, because the publication is -due to the lack of other rewards- the only incentive to submit data and code. If an article is published, this incentive is no longer given.

Exemptions from the data policy and the case of proprietary data

In economic research it is quite common to use proprietary datasets. Companies as Thomson Reuters Data Stream offer the possibility to acquire datasets and many researchers are choosing such options. Also research based on company data or microdata always is proprietary or even confidential.

Normally, if researchers want to publish an article based on these data, they have to request for an exemption from the the data policy. More than 72% of the journals we analyzed offered this possibility. One journal (Journal of the European Economic Association) discourages authors from publishing articles that rely on completely proprietary data.

But even if proprietary data was used for research, it is important that these research outputs are replicable in principle. Therefore journals should have a procedure in place that ensures the replicability of the results even in these cases. Consequently some journals request their authors to provide the code of computation, the version(s) of the dataset(s) and some additional information on how to obtain the dataset(s).

Of the 28 journals allowing exemptions from the data policy we found out that more than 60% possess rules for these cases. This is a percentage that is not really satisfactory. There is still room for improvements.

Open formats

Open formats are important for two reasons: The first is that the long-term preservation of these data is much easier, because the technical specifications of open formats are known. A second reason is that open formats offer the possibility to use data and code in different platforms and software environments. It is useful to have the possibility to utilize the data interoperably and not only in one statistical package or on one platform.

Regarding these topics only two journals made recommendations for open formats.

Version of software and OS

According to McCullough and Vinod (2003) the results achieved in economic research are often influenced by the statistical package that was used for calculations. Also the operating system has a bearing on the results. Therefore both the version of the software and the OS used for calculations should be specified.

Most of the data policies in our sample do not mandate their authors to provide these specifications. But there are differences: For example almost every journal that has adopted the data availability policy of the American Economic Review (AER) requires its authors to “document[…] the purpose and format of each file provided” for each file they submit to the journal.

In sharp contrast, up to now not a single policy requires the specification of the operating system used for calculations.

Replication rubric

In the course of our study we also examined whether journals have a special section for providing the results of replication attempts. We found out that only a very limited number of journals own a section for results of replications. In an additional online survey of the project EDaWaX 7 journals stated that they publish replication results or attempts in the journals. However the quantity of these replication attempts was low: None of the respondents published more than three replication studies per annum, most even less than one per year.

The need for a replication section mainly consists by controlling the quality of the data submitted. If a journal does not publish the results of replications authors may submit bad quality data.

Conclusion

In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is many more than other researchers found some years ago but compared to the multitude of economic journals in total the percentage of journals equipped with a data availability policy is still quite low. The 20.6% we found in our analyses might be the main proportion of all journals equipped with a data policy.

Nevertheless, editors and journals in economics seem to be in motion – the topic of data availability seems to become more and more important in economics. This is a positive signal and it will be an interesting aspect to monitor whether and how this upward trend continues.

A large portion of the analyzed data availability policies are mandatory, which is a good practice. Moreover, the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have appreciated the importance of providing data at an early stage in the publication process.

When analysing the data authors have to provide, we noticed that almost all guidelines mandate the submission of the (final) dataset(s), which is also quite positive.

But beyond that there is much room for improvements: Only two thirds of all policies require the submission of descriptions and of (self-written) software. As mentioned above, research data often is not usable, when descriptions or software components are missing. In particular the lack of requirements to submit the code of computation is a big problem for potential replication attempts. Only a small majority of all policies pledges their authors to provide it. Therefore it can be expected that almost half of the data availability policies in our sample is not fully enabling replications.

Another important aspect is the possibility to replicate the results of economic research that is based on proprietary or confidential data. While more than 72% of all policies allowing exemptions from their regulations, only 60.7% have a procedure in place that regulates data and descriptions which authors still have to provide in these cases. On balance, many research based on proprietary or confidential data is not replicable even in principle.

Open formats are used by a small minority of journals only. This might result in difficulties for the interoperable use of research data and the long-term preservation of these important sources of science and research.
The reuse of research data is also complicated by the lack of information on which version of a software was used for calculations. Only little more than a third of all policies discusses that authors have to specify the software version / the formats of submitted research data. Besides up to now, no single journal requires the specification of the operating system used.

But there are also good practices: Among the journals with data availability policies we noticed that the data availability policy implemented by the American Economic Review (AER) is a very good example of a data availability policy in economic journals. Journals equipped with this policy are the biggest single group of guidelines in our sample. Therefore we see a developing trend towards a de facto-standard for data policies.

In a second part to this survey (to be published in spring 2013) we will discuss the infrastructure used by economic scholarly journals for providing datasets and other materials.

This post has been added to the resources of the Open Economics Working Group.

Review of Open Access in Economics

- October 26, 2012 in Economic Publishing, Featured, Open Access

Ever since BioMed Central (BMC) published its first free online article on July 19th 2000, the Open Access movement has made significant progress, so much so that many different stakeholders now see 100% Open Access to research as inevitable in the near future. Some are already extrapolating from recent growth trends that Open Access will take 90% of the overall article share by just 2020 (Lewis, 2012). Another recent analysis shows that during 2011 the number of Open Access articles published was ~340,000 spread over ~6,700 different journals which is about 17% of the overall literature space (1.66 million articles) for that year (Laakso & Bjork, 2012).

Perhaps because of the more obvious lifesaving benefits, biomedical research in particular has seen the largest growth in Open Access – patients & doctors alike can gain truly lifesaving benefit from easy, cost-free, Open Access to research. Those very same doctors and patients may have difficulty accessing the latest toll access-only research; any delay or impediment to accessing up-to-date medical knowledge can have negative, even fatal consequences:

[The following is from ‘The impact of open access upon public health. PLoS Medicine (2006) 3:e252+‘ illustrating how barriers to knowledge access have grave consequences]

Arthur Amman, President of Global Strategies for HIV Prevention, tells this story: “I recently met a physician from southern Africa, engaged in perinatal HIV prevention, whose primary access to information was abstracts posted on the Internet. Based on a single abstract, they had altered their perinatal HIV prevention program from an effective therapy to one with lesser efficacy. Had they read the full text article they would have undoubtedly realized that the study results were based on short-term follow-up, a small pivotal group, incomplete data, and unlikely to be applicable to their country situation. Their decision to alter treatment based solely on the abstract’s conclusions may have resulted in increased perinatal HIV transmission”

But there are also significant benefits to be gained from Open Access to other, non-biomedical research. Open Access to social science & humanities research is also increasing, and has recently been mandated by Research Councils UK (RCUK), the UK agency that dictates policy for all publicly-funded academic research in the UK, on the basis of the Finch report [PDF]. Particularly with respect to economics, I find it extremely worrying that our MPs and policymakers often do NOT have access to the latest academic economic research. David Willetts MP, recently admitted he couldn’t access some research on a BBC Radio 3 interview recently. Likewise at the Open Knowledge Festival in Helsinki recently, a policymaker expressed frustration at his inability to access possible policy-influencing evidence as published in academic journals.

So, for this blogpost, I set about seeing what the Open Access publishing options are for economists. I am well-versed in the OA options for scientists and have produced a visualization of various different paid Gold Open Access options here which has garnered much interest and attention. Even for scientists there are a wealth of completely free-to-publish-in options that are also Open Access (free-to-read, no subscription or payment required).

As far I can see, the Gold Open Access ‘scene’ in Economics is less well-developed relative to the sciences. The Directory of Open Access Journals (DOAJ) lists 192 separate immediate Open Access journals of varying quality (compared to over 500 medical journals listed in DOAJ). These OA economics journals also seem to be newer on average than the similar spread of OA biomedical journals. Nevertheless I found what appear to be some excellent OA economics journals including:

  • Economic Analysis and Policy  – a journal of the Economic Society of Australia, seems to take great pride and interest in Open Access: there’s a whole issue devoted to the subject of Open Access in Economics with papers by names even I recognise e.g. Christian Zimmermann & John Willinsky.
  • Theoretical Economics – published by the Econometrics Society three times a year. Authors retain the copyright to their works, and these are published under a standard Creative Commons licence (CC BY-NC). The PDFs seem very high-quality to me and contain an abundance of clickable hyperlinks & URLs – an added-value service I don’t even see from many good subscription publishers! Publishing here only requires one of the authors to be a member of the society which only costs £50 a year, with fee reductions for students. Given many OA science publications cost >£1000 per publication I find this price extremely reasonable.
  • Monthly Labor Review – published by the US Department of Labor, and in existence since 1915(!) this seems to me to be another high-quality, highly-read Open Access journal.
  • Economics – published in Germany under a Creative Commons Licence (CC BY-NC). It has an excellent, modern and clear website, great (high-standard) data availability policy and even occasionally awards prizes for the best papers published in the journal.
  • Journal of Economic and Social Policy – another Australian journal, established in the year 2000, providing a simple but completely free outlet for publishing on social and economic issues, reviewing conceptual problems, or debating policy initiatives.
  • …and many more. Just like with science OA journals there are numerous journals of local interest e.g. Latin American journals: Revista Brasileira de Economia, the Latin American Journal of Economics, Revista de Economia Contemporânea, Revista de Análisis Económico. European journals like the South-Eastern Europe Journal of Economics (SEEJE) and Ekonomska Istrazivanja (Croatian) and Asian journals e.g. Kasarinlan (Philippine Journal of Third World Studies). These should not be dismissed or discounted, not everything is appropriate for ‘international-scope’ journals. Local journals are important for publishing smaller scale research which can be built-upon by comparative studies and/or meta-analyses.
 It’s International Open Access Week this week: 22 – 28 October 2012

 

Perhaps more interesting with respect to Open Access in Economics is the thriving Green Open Access scene. In the sciences Green Open Access is pretty limited in my opinion. arXiv has popularised Green OA in certain areas of physics & maths but in my particular domain (Biology) Green OA is a deeply unpopular and unused method of providing OA. From what I have seen OA initiatives in Economics such as RePEc (Research Papers in Economics) and EconStor seem to be extremely popular and successful. As I understand it RePEc provides Open Bibliographic Data for an impressive volume of Economics articles, in this respect the field is far ahead of the sciences – there is little free or open bibliographic data from most science publishers. EconStor is an OA repository of the German National Library of Economics – Leibniz Information Centre for Economics. It contains more than 48,000 OA works which is a fiercely impressive volume. The search functions are perhaps a tad basic, but with that much OA literature collected and available for use I’ve no doubt someone will create a better, more powerful search interface for the collection.

In summary, from my casual glance at OA publishing in Economics as a non-economist, mea culpa, things look very positive here. Unless informed otherwise I think the OA scene here too is likely to grow and dominate the academic publishing space as it is in other areas of academia.

References

Laakso, M. and Bjork, B. C. 2012. Anatomy of open access publishing: a study of longitudinal development and internal structure. BMC Medicine 10:124+

Lewis, D. W. 2012. The inevitability of open access. College & Research Libraries 73:493-506.