Can we have open data without open access?

Many of the motivations which drive open access are similar to why we want open data in social science research: making one’s research more widely available to the research community and to the wider public, producing more and better research that can be reproduced and verified.


On February 7-8, the University of Minho hosted the OpenAIRE Interoperability workshop, inviting academics, repository managers, publishers, funders, national help desks and open science advocates to discuss the challenge of interoperability in the emerging open access infrastructures.

Reviewing some of the reasons for open access, one would see that they are the same as for open data. Eloy Rodriguez (University of Minho Documentation Services) presented the drivers for open access what came in the discussion like the monitoring and assessment of research output, the visibility and impact, the economic benefits including innovation, the empowerment of institutions to preserve their own research outputs and the change in science and research dissemination. All of these drivers for open access are equally important for data and code as they represent the evidence which backs up a publication.


The answer is “No” according to Geoffrey Boulton (Royal Society, University of Edinburgh), Chair of the Working Group of the Science as an Open Enterprise report, who contended that publishing and data are invariably linked as data constitutes the evidence and maintains the self-correction and credibility in science: “Science corrects itself as long as you provide the knowledge by which it can do so”. Brain Hole (Ubiquity Press) stressed that research needs an effective and efficient model of distribution and presented the model of publishing datasets in a similar way in which research is published – in peer-reviewed open access data journals. This model would create additional incentives for sharing data, as researchers would also gain citations and reputation by publicising their datasets.

Who owns the data?

Publishers sell the published research which was signed over to them by the very same research producers who are buying it.

Victoria Stodden (Columbia University, Open Economics Advisory Panel member) wrote in her blog how similar to the copyright sign-over to journals, many researchers are required to sign non-disclosure agreements when working with commercial data, even when no privacy issues are involved, preventing them from sharing it with other researchers. In some fields of science it goes even further, e.g. Ben Goldacre writes in Bad Pharma that “university administrators and ethics committees permit contracts with industry that explicitly say that the sponsor can control the data”, a research misconduct which is also one of the reasons for publication bias and overstating the benefits of treatments in medicines research.

How they can go together

Storing, linking and preserving data from social science research in a sustainable manner may be more complex than creating open access repositories for publications: after all researchers work sometimes with enormous datasets which can be usable by the research community only with proper descriptions of the research process in which the data was generated. After all even if some publishers or funders have data availability policies: these are rarely enforced, as making research data available would also require the establishment and maintenance of an elaborate data management infrastructures.

However, once open access infrastructures exist, it could be possible to have the data and code as one of the resources published along with the paper itself. Preserving these datasets on a large scale and in a sustainable manner would require massive repositories where datasets receive permanent digital identifiers, which would guarantee stable linking even if publishers or universities change the URLs.

While open access policies and structures might be getting more popular in some countries or science fields, there is still limited understanding of how to make data from research available on a wider scale. It is however clear that the experiences of the open access movement are key lessons for our understanding of how to make research data openly available.

