Support Us

You are browsing the archive for EDaWaX.

Data Sharing: Poor Status Quo in Economics

Sven Vlaeminck - March 5, 2013 in EDaWaX, External Projects, Featured

hare_c_flickrThis article is cross-posted from the blog of the European Data Watch Extended Project

In the context of our research project EDaWaX a new research paper has been published by Patrick Andreoli-Versbach (International Max Planck Research School for Competition and Innovation (IMPRS-CI), LMU Munich, Munich Center for Innovation and Entrepreneurship Research (MCIER)) and Frank Mueller-Langer(Max Planck Institute for Intellectual Property and Competition Law, IMPRS-CI, MCIER).

The paper analyzes the data sharing behavior of 488 randomly chosen empirical economists. More specifically, the researchers under study were chosen uniformly across the top 100 economics departments and the top 50 business schools and randomly within the respective institution. Economics departments were chosen using the Shanghai Ranking 2011 in Economics and Business and business schools were chosen using the Financial Times Global MBA Ranking 2011.

Read the rest of this entry →

Dutch PhD-workshop on research design, open access and open data

Velichka Dimitrova - February 1, 2013 in Economic Publishing, EDaWaX, External Projects, Featured, Open Access, Open Data, Open Economics

This blog post is written by Esther Hoorn, Copyright Librarian, University of Groningen, the Netherlands.

If Roald Dahl were still alive, he would certainly be tempted to write a book about the Dutch social psychologist Diederik Stapel. For not only did he make up the research data to support his conclusions, but also he ate all the M&M’s, which he bought with public money for interviews with fictitious pupils in fictitious high schools. In the Netherlands the research fraud by Stapel was a catalyst to bring attention to the issue of research integrity and availability of research data. A new generation of researchers needs to be aware of the policy on sharing research data by the Dutch research funder NWO, the EU policy and the services of DANS, the Dutch Data archiving and networked services. In the near future, a data management plan will be required in every research proposal.

Verifiability

For some time now the library at the University of Groningen is organizing workshops for PhDs to raise awareness on the shift towards Open Access. Open Access and copyright are the main themes. The question also to address verifiability of research data came from SOM, the Research Institute of the Faculty of Economics and Business. The workshop is given as part of the course Research Design of the PhD program. The blogpost Research data management in economic journals proved to be very useful to get an overview of the related issues in this field.

Open Access

As we often see, Open Access was a new issue to most of the students. Because the library buys licenses the students don’t perceive a problem with access to research journals. Moreover, they are not aware of the big sums that the universities at present pay to finance access exclusively for their own staff and students. Once they understand the issue there is a strong interest. Some see a parallel with innovative distribution models for music. The PhDs come from all over the world. And more and more Open Access is addressed in every country of the world. One PhD from Indonesia mentioned that the Indonesian government requires his dissertation to be available through the national Open Access repository. Chinese students were surprised by availability of information on Open Access in China.

Assignment

The students prepared an assignment with some questions on Open Access and sharing research data. The first question still is on the impact factor of the journals in which they intend to publish. The questions brought the discussion to article level metrics and alternative ways to organize the peer review of Open Access journals.

Will availability of research data stimulate open access?

Example of the Open Access journal Economics

The blogpost Research data management in economic journals presents the results of the German project EdaWax, European Data Watch Extended. An important result of the survey points at the role of association and university presses. Especially it appears that many journals followed the data availability policy of the American Economic Association.

[quote] We found out that mainly university or association presses have high to very high percentages of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

For students it is assuring to see how associations take up their role to address this issue. An example of an Open Access journal that adopted the AER policy is Economics. And yes, this journal does have an impact factor in the Social Science Citation Index and also the possibility to archive the datasets in the Dataverse Network.

Re-use of research data for peer review

One of the students suggested that the public availability of research data (instead or merely research findings) may lead to innovative forms of review. This may facilitate a further shift towards Open Access. With access to underlying research data and methodologies used, scientists may be in a better position to evaluate the quality of the research conducted by peers. The typical quality label given by top and very good journals may then become less relevant, over time.
It was also discussed that journals may not publish a certain numbers of papers in a volume released e.g. four times a year, but rather as qualifying papers are available for publication throughout the year. Another point raised was that a substantial change in the existing publication mechanics will likely require either top journals or top business schools to lead the way, whereas associations of leading scientists in a certain field may also play an important role in such conversion.

Research Data Management in Economic Journals

Sven Vlaeminck - December 7, 2012 in Economic Publishing, EDaWaX, Featured, Open Data, Open Economics, Open Research

This blog post has been written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics

Research Data Management in Economic Journals

Background

In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.

In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)

Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.

The project EDaWaX

Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:

First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?

As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.

Data Policies of Economic Journals

Download Dataset

The Sample

First of all, we wanted to know how many journals in Economics require their authors to provide their empirical analysis data. Of course it was not possible to analyze all of the estimated 8,000 to 10,000 journals in Economics.

We used a sample built by Bräuninger, Haucap and Muck (paper available in German only) for examining the relevance and reputation of economic journals in the eyes of German economists. This sample was very useful for our approach because it allowed the comparison of the international top journals to journals published in the German-speaking area. Using the sample’s rankings for relevance and reputation we could also establish that journals with data policies were also the ones with higher ranking.

In addition to the sample of Bräuninger, Haucap and Muck, we added four additional journals equipped with data availability policy to have more journals in our sample for a detailed evaluation of data policies. We excluded some journals because they are focused only on economic policy or theory and do not publish empirical articles.

The sample we used is not representative for economic journals, because it mainly consists of high-ranked journals. Furthermore, by adding some journals explicitly owning a data policy, the percentage of journals that is equipped with such guidelines also is much higher than we do expect for economic journals in general.

Journals owning a data availability policy

In our sample we have 29 journals equipped with a data availability policy (20.6%) and 11 journals (7.8%) owning a so called “replication policy” (we only examined the websites of the journals, not the printed versions). As mentioned above, this percentage is not representative for economic journals in general. In the contrary we assume that in our sample the majority of economic journals with data (availability) policies is included.

The number of journals with a data availability policy is considerably higher compared to earlier studies where other researchers (e.g. McCullough) examined the data archives of economic journals. An additional online-survey for editors of economic journals showed that most of our respondents implemented the data policies between 2004 and 2011. Therefore we suppose that the number of economic journals with data policies is slightly increasing. The editors of economic scholarly journals seem to realize that the topic of data availability is becoming more important.

The biggest portion of journals equipped with data availability policy were published by Wiley-Blackwell (6) and Elsevier (4). We found out that mainly university or association presses have high to very high percentage of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

The journals with a “replication policy” were excluded from further analysis. The reason is that “replication policies” are pledging authors to provide “sufficient data and other materials” on request only, so there are no files authors have to provide to the journal. This approach sounds good in theory – but it does not work in practice because authors often simply refuse to honor the requirements of these policies. (See “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” by Dewald, Thursby and Anderson).

Some criteria for data policies to enable replications

For a further evaluation of these data availability policies, we used some criteria for rating the quality of the policies: we extended some of the previously developed criteria by B.D. McCullough by adding standards which are important from an infrastructural point of view. The criteria we used for evaluation are as follows:

Data Policies that aim to ensure the replicability of economic research results have to:

  • be mandatory,
  • pledge authors to provide datasets, the code of computation, programs and descriptions of the data and variables (in form of a data dictionary at best),
  • assure that the data is provided prior to publication of an article,
  • have defined rules for research based on proprietary or confidential data,
  • provide the data, so other researchers can access these data without problems.

Besides journals should:

  • have a special section for the results of replication attempts or should at least publish results of replications in addition to the dataset(s),
  • require their authors to provide the data in open formats or in ASCII-format,
  • require their authors to specify the name and version of both the software and the operation system used for analysis.

Results of our survey

The above mentioned requirements have been used to analyze the data policies of 141 economic journals. These are some of the results we obtained:

Mandatory Data Availability Policies

We found out that more than 82% of the data policies are mandatory. This is a quite good percentage because for obtaining data it is crucial that policies mandate authors to do so. If they do not, there is little hope that authors provide a noteworthy amount of datasets and code – simply because it is time-consuming to prepare datasets and code and authors do not receive rewards for doing this work. Besides, authors often do not want to publish a dataset that is not fully exploited. In the academic struggle for reputation the opposite a researcher wants is to provide a substantial dataset to a competitor.

What data authors have to provide

We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus often is more oriented towards experimental economic research.

Regarding the question what kinds of data authors have to submit, we found out that 65.5% of the journals’ data policies require their authors to provide descriptions of the data submitted and some instructions on how to use the single files submitted. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be-replicators.

For the purpose of replication these descriptions of submitted data are very important due to the structure of the data authors are providing: In most cases, data is available as a zip-file only. In these zip-containers there is a broad bunch of different formats and files. Without proper documentation, it is extremely time-consuming to find out what part of the data corresponds to which results in the paper, if this is possible at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide descriptions. This kind of documentation is currently the most important part of metadata for describing the research data.

The submission of (self-written) programs used e.g. for simulation purposes is mandatory for 62% of the policies. This relatively low percentage can also be considered as problematic: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available.

Of course it depends on the journal’s focus, whether this kind of research is published. But if suchlike papers are published, a journal should take care that the programs used and the source code of the application are submitted. Only if the source code is available it is possible to check for inaccurate programming.

Approximately half of the policies mandate their authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. Without the code would-be replicators have to code everything from scratch. Whether these researchers will be able to compile an identical code of computation is uncertain. Therefore it is crucial that data availability policies enforce strict availability of the code of computation.

The point in time for providing datasets and other materials

Regarding the question at which point in time authors have to submit the data to the journal, we found out that almost 90% of the data availability policies pledge authors to provide their data prior to the publication of an article. This is a good percentage. It is important to obtain the data prior to publication, because the publication is -due to the lack of other rewards- the only incentive to submit data and code. If an article is published, this incentive is no longer given.

Exemptions from the data policy and the case of proprietary data

In economic research it is quite common to use proprietary datasets. Companies as Thomson Reuters Data Stream offer the possibility to acquire datasets and many researchers are choosing such options. Also research based on company data or microdata always is proprietary or even confidential.

Normally, if researchers want to publish an article based on these data, they have to request for an exemption from the the data policy. More than 72% of the journals we analyzed offered this possibility. One journal (Journal of the European Economic Association) discourages authors from publishing articles that rely on completely proprietary data.

But even if proprietary data was used for research, it is important that these research outputs are replicable in principle. Therefore journals should have a procedure in place that ensures the replicability of the results even in these cases. Consequently some journals request their authors to provide the code of computation, the version(s) of the dataset(s) and some additional information on how to obtain the dataset(s).

Of the 28 journals allowing exemptions from the data policy we found out that more than 60% possess rules for these cases. This is a percentage that is not really satisfactory. There is still room for improvements.

Open formats

Open formats are important for two reasons: The first is that the long-term preservation of these data is much easier, because the technical specifications of open formats are known. A second reason is that open formats offer the possibility to use data and code in different platforms and software environments. It is useful to have the possibility to utilize the data interoperably and not only in one statistical package or on one platform.

Regarding these topics only two journals made recommendations for open formats.

Version of software and OS

According to McCullough and Vinod (2003) the results achieved in economic research are often influenced by the statistical package that was used for calculations. Also the operating system has a bearing on the results. Therefore both the version of the software and the OS used for calculations should be specified.

Most of the data policies in our sample do not mandate their authors to provide these specifications. But there are differences: For example almost every journal that has adopted the data availability policy of the American Economic Review (AER) requires its authors to “document[…] the purpose and format of each file provided” for each file they submit to the journal.

In sharp contrast, up to now not a single policy requires the specification of the operating system used for calculations.

Replication rubric

In the course of our study we also examined whether journals have a special section for providing the results of replication attempts. We found out that only a very limited number of journals own a section for results of replications. In an additional online survey of the project EDaWaX 7 journals stated that they publish replication results or attempts in the journals. However the quantity of these replication attempts was low: None of the respondents published more than three replication studies per annum, most even less than one per year.

The need for a replication section mainly consists by controlling the quality of the data submitted. If a journal does not publish the results of replications authors may submit bad quality data.

Conclusion

In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is many more than other researchers found some years ago but compared to the multitude of economic journals in total the percentage of journals equipped with a data availability policy is still quite low. The 20.6% we found in our analyses might be the main proportion of all journals equipped with a data policy.

Nevertheless, editors and journals in economics seem to be in motion – the topic of data availability seems to become more and more important in economics. This is a positive signal and it will be an interesting aspect to monitor whether and how this upward trend continues.

A large portion of the analyzed data availability policies are mandatory, which is a good practice. Moreover, the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have appreciated the importance of providing data at an early stage in the publication process.

When analysing the data authors have to provide, we noticed that almost all guidelines mandate the submission of the (final) dataset(s), which is also quite positive.

But beyond that there is much room for improvements: Only two thirds of all policies require the submission of descriptions and of (self-written) software. As mentioned above, research data often is not usable, when descriptions or software components are missing. In particular the lack of requirements to submit the code of computation is a big problem for potential replication attempts. Only a small majority of all policies pledges their authors to provide it. Therefore it can be expected that almost half of the data availability policies in our sample is not fully enabling replications.

Another important aspect is the possibility to replicate the results of economic research that is based on proprietary or confidential data. While more than 72% of all policies allowing exemptions from their regulations, only 60.7% have a procedure in place that regulates data and descriptions which authors still have to provide in these cases. On balance, many research based on proprietary or confidential data is not replicable even in principle.

Open formats are used by a small minority of journals only. This might result in difficulties for the interoperable use of research data and the long-term preservation of these important sources of science and research.
The reuse of research data is also complicated by the lack of information on which version of a software was used for calculations. Only little more than a third of all policies discusses that authors have to specify the software version / the formats of submitted research data. Besides up to now, no single journal requires the specification of the operating system used.

But there are also good practices: Among the journals with data availability policies we noticed that the data availability policy implemented by the American Economic Review (AER) is a very good example of a data availability policy in economic journals. Journals equipped with this policy are the biggest single group of guidelines in our sample. Therefore we see a developing trend towards a de facto-standard for data policies.

In a second part to this survey (to be published in spring 2013) we will discuss the infrastructure used by economic scholarly journals for providing datasets and other materials.

This post has been added to the resources of the Open Economics Working Group.