Support Us

You are browsing the archive for Events.

Open Economics: the story so far…

- August 30, 2013 in Advisory Panel, Announcements, Events, Featured, Open Data, Open Economics, Projects

A year and a half ago we embarked on the Open Economics project with the support of the Alfred P. Sloan Foundation and we would like a to share a short recap of what we have been up to.

Our goal was to define what open data means for the economics profession and to become a central point of reference for those who wanted to learn what it means to have openness, transparency and open access to data in economics.

Advisory Panel of the Open Economics Working Group:
openeconomics.net/advisory-panel/

Advisory Panel

We brought together an Advisory Panel of twenty senior academics who advised us and provided input on people and projects we needed to contact and issues we needed to tackle. The progress of the project has depended on the valuable support of the Advisory Panel.

1st Open Economics Workshop, Dec 17-18 ’12, Cambridge, UK:
openeconomics.net/workshop-dec-2012/

2nd Open Economics Workshop, 11-12 June ’13, Cambridge, MA:
openeconomics.net/workshop-june-2013

International Workshops

We also organised two international workshops, first one held in Cambridge, UK on 17-18 December 2012 and second one in Cambridge U.S. on 11-12 June 2013, convening academics, funders, data publishers, information professionals and students to share ideas and build an understanding about the value of open data, the still persisting barriers to opening up information, as well as the incentives and structures which our community should encourage.

Open Economics Principles

While defining open data for economics, we also saw the need to issue a statement on the openness of data and code – the Open Economics Principles – to emphasise that data, program code, metadata and instructions, which are necessary to replicate economics research should be open by default. Having been launched in August, this statement is now being widely endorsed by the economics community and most recently by the World Bank’s Data Development Group.

Projects

The Open Economics Working Group and several more involved members have worked on smaller projects to showcase how data can be made available and what tools can be built to encourage discussions and participation as well as wider understanding about economics. We built the award-winning app Yourtopia Italy – http://italia.yourtopia.net/ for a user-defined multidimensional index of social progress, which won a special prize in the Apps4Italy competition.




Yourtopia Italy: application of a user-defined multidimensional index of social progress: italia.yourtopia.net

We created the Failed Bank Tracker, a list and a timeline visualisation of the banks in Europe which failed during the last financial crisis and released the Automated Game Play Datasets, the data and code of papers from the Small Artificial Agents for Virtual Economies research project, implemented by Professor David Levine and Professor Yixin Chen at the Washington University of St. Louis. More recently we launched the Metametrik prototype of a platform for the storage and search of regression results in the economics.


MetaMetrik: a prototype for the storage and search of econometric results: metametrik.openeconomics.net

We also organised several events in London and a topic stream about open knowledge and sustainability at the OKFestival with a panel bringing together a diverse range of panelists from academia, policy and the open data community to discuss how open data and technology can help improve the measurement of social progress.

Blog and Knowledge Base

We blogged about issues like the benefits of open data from the perspective of economics research, the EDaWaX survey of the data availability of economics journals, pre-registration of in the social sciences, crowd-funding as well as open access. We also presented projects like the Statistical Memory of Brazil, Quandl, the AEA randomized controlled trials registry.

Some of the issues we raised had a wider resonance, e.g. when Thomas Herndon found significant errors in trying to replicate the results of Harvard economists Reinhart and Rogoff, we emphasised that while such errors may happen, it is a greater crime not to make the data available with published research in order to allow for replication.

Some outcomes and expectations

We found that opening up data in economics may be a difficult matter, as many economists utilise data which cannot be open because of privacy, confidentiality or because they don’t own that data. Sometimes there are insufficient incentives to disclose data and code. Many economists spend a lot of resources in order to build their datasets and obtain an advantage over other researchers by making use of information rents.

Some journals have been leading the way in putting in place data availability requirements and funders have been demanding data management and sharing plans, yet more general implementation and enforcement is still lacking. There are now, however, more tools and platforms available where researchers can store and share their research content, including data and code.

There are also great benefits in sharing economics data: it enables the scrutiny of research findings and gives a possibility to replicate research, it enhances the visibility of research and promotes new uses of the data, avoids unnecessary costs for data collection, etc.

In the future we hope to concentrate on projects which would involve graduate students and early career professionals, a generation of economics researchers for whom sharing data and code may become more natural.

Keep in touch

Follow us on Twitter @okfnecon, sign up to the Open Economics mailing list and browse our projects and resources at openeconomics.net.

Second Open Economics International Workshop Recap

- July 5, 2013 in Events, Featured, Workshop


Open Knowledge Foundation, CIPIL, MIT Sloan. Supported by Alfred P. Sloan Foundation

On June 11-12, the Open Economics Working Group of the Open Knowledge Foundation organised the Second Open Economics International Workshop, hosted at the MIT Sloan School of Management, a second of two international workshops funded by the Alfred P. Sloan Foundation, aimed at bringing together economists and senior academics, funders, data publishers and data curators in order to discuss the progress made in the field of open data for economics and the still existing challenges. This post is an extended summary of the speakers’ input and some of the discussion. See the workshop page for more details.

Setting the Scene

The first panel addressed the current state of open data in economics research and some of the “not bad” practices in the area. Chaired by Rufus Pollock (Open Knowledge Foundation) the panel brought together senior academics and professionals from economics, science, technology and information science.

Eric von Hippel (MIT Sloan School of Management) talked about open consumer-developed innovations revealing that consumers actually innovate a lot to solve their needs as private users and while they are generally willing to let others adopt their innovations for free, they don’t actively invest in knowledge diffusion. As producers of findings, economists have high incentives to diffuse those, but as users of private research methods and data they have low or negative incentives to diffuse to rivals. Lower costs of diffusion, increasing the benefits from diffusion, more collaborative research processes and mandatory sharing are some of the ways to increase economists’ incentives to diffuse research methods and data as they diffuse findings. [See slides]

Micah Altman (MIT Libraries, Brookings Institution) stressed that best practices are often not “best” and rarely practiced thus preferred to discuss some probably “not bad” practices including policy practices for the dissemination and citation of data: e.g. that data citations should be treated as first-class objects of publication as well as reproducibility policies where more support should be given to publishing replications and registering studies. He emphasised that policies are often not self-enforcing or self-sustaining and compliance with data availability policies even in some of the best journals is very low. [See slides]

Shaida Badiee (Development Data Group, World Bank) shared the experience of setting the World Bank’s data free in 2010 and the exceptional popularity and impact the World Bank’s data has affected. To achieve better access, data is legally open – undiscriminating about the types of uses – given the appropriate user support, available in multiple languages, platforms and devices e.g. with API access, plug-ins for regression software, integration with external applications and mobile phones, etc. She reminded that data is as good as the capacity of the countries which produce it and that working closely with countries to improve their statistical capacities is necessary for the continuous improvement of data. The World Bank works in partnership with the global open data community and provides supports to countries who are willing to launch their own open data initiatives. [See slides]

Philip E. Bourne (UCSD) shared some thoughts from the biomedical sciences and indicated that while there are some success stories, many challenges still need to be addressed e.g. the lack of reproducibility and the unsolved problem of sustainability. He highlighted that change is driven by the community and there should be a perception that the community owns this culture including e.g. transparency and shared ownership, a reward system for individuals and teams, strategic policies on open access and data sharing plans, etc. and critically, the notion of “trust” in the data, which is crucial to the open data initiative. Funders and institutions may not initiate change but they would eventually follow suit: the structural biology community created successful data sharing plans before funders. He emphasised that it is all about openness: no restrictions on the usage of the data beyond attribution, running on open source software and transparency about data usage. [See slides]

Knowledge Sharing in Economics

The second panel, chaired by Eric von Hippel (MIT Sloan School of Management) dealt closer with the discipline of economics, what technological and cultural challenges still exist and what are the possible roles and initiatives. [See audio page].

Joshua Gans (University of Toronto) analysed some of the motives for knowledge contribution – e.g. money, award and recognition, ownership and control, intrinsic motivation, etc. – and addressed other issues like the design and technology problems which could be as important as social norms. He talked about designing for contribution and the importance of managing guilt: since there is a concern that data should be accurate and almost perfect, less data is contributed, so a well-designed system should enable the possibility of contributing imperfect pieces (like Wikipedia and open-source in breaking down contributions). This should be ideally combined with an element of usefulness for the contributors – so that they are getting something useful out of it. He called for providing an easy way of sharing without the hassle and all the questions which come from data users since there are “low hanging fruit” datasets that can be shared. [See slides]

Gert Wagner (German Institute for Economic Research DIW) spoke in his capacity as a Chairman of the German Data Forum, an organisation which promotes production of data, data re-use and re-analysis of data. He pointed out that there is no culture of data sharing in economics: “no credit is given where credit is due” and incentives should be promoted for sharing economics data. So far just funding organisations can enforce data sharing by data producers, but this only happens at the institutional level. For individual authors there is a little incentive to share data. As ways to change this culture, he suggested that there is a need to educate graduate students and early career professionals. In the German Socio-Economic Panel Study, a panel study of private households in Germany, they have been applying the Schrumpeter’s principle: where producers who innovate must educate the consumers if necessary. Along with the workshops which educate the new users in technical skills, they will be also educated to cite the data and give the credit, where credit is due. [See slides]

Daniel Feenberg (National Bureau of Economic Research) gave a brief introduction about NBER, which is a publisher of about a thousand working papers a year, more than a third of which are empirical economics papers of the United States. There is the option to upload data resources in a “data appendix” which are put on the website and available for free. Very few authors, however, take the advantage of being able to publish the data and are also aware that they will get questions if they make their data available. He mentioned that requiring data sharing is only something that employers and funders can mandate and there is a limited role for the publisher. Beside the issues of knowledge sharing design and incentives for individual researchers, there is also the issue of governments sharing data, where confidentiality is a big concern but also where politically motivated unscientific research may inform policy in which case more access and more research is better than less research.

John Rust (Georgetown University) indicated that the incentives for researchers might not be the biggest problem, but there is an inherent conflict between openness and confidentiality and there is a lot of economics research which uses data that cannot be made publicly available. While companies and organisations are often sceptical, risk-averse and not aware of the benefits of sharing their operations data with researchers, they could save money and make profit by research insights e.g. especially in the field of optimising rental and replacement decisions (see e.g. seminal paper by Rust 1987). Appealing to the self-interest of firms and showing success stories where collaborative research has worked can convince firms to share more data. The process of establishing trust and getting data could be aided by trusted intermediaries who can house and police confidential data and have the expertise to work with information protected by non-disclosure agreements.

Sharing Research Data

The panel session “Sharing research data – creating incentives and scholarly structures” was chaired by Thomas Burke (European University Institute Library) and dealt with different incentives and opportunities researchers have for sharing their data: storing it in a curated repository like the ICPSR or a self-service repository like DataVerse. In order to be citable a dataset should obtain a DOI where DataCite provides such a service and where a dataset can be also published with a data paper in a peer-review data journal. [See audio page].

Amy Pienta (The Interuniversity Consortium for Political and Social Research – ICPSR) presented some context about the ICPSR – the oldest archive for social science data in the United States, which has been supporting data archiving and dissemination for over 50 years. Among some of the incentives for researchers to share data, she mentioned the funding agencies’ requirements to make data available, scientific openness and stimulating new research. ICPSR has been promoting data citations and getting more journal editors to understand data citations and when archiving data also capturing how data is being used, by what users, institutions, etc. The ICPSR is also currently developing open access data as a new product, where researchers will be allowed to publish their original data, tied with data citation and DOI, data downloads and usage statistics and layered with levels of curation services. See slides.

Mercè Crosas (Institute for Quantitative Social Science, Harvard University) presented the background of the DataVerse network, a free and open-source service and software to publish, share and reference research data, originally open only to social scientists, it now welcomes contributions from all universities and disciplines. It is completely self-curated platform where authors can upload data and additional documentation, adding additional metadata to make the resource more discoverable. It builds on the incentives of data sharing, giving a persistent identifier, generating automatically a data citation (using the format suggested by Altman and King 2007), providing usage statistics and giving attribution to the contributing authors. Currently DataVerse is implementing closer integration with journals using OJS, where the data resources of an approved paper will be directly deposited online. She also mentioned also the Amsterdam Manifesto on Data Citation Principles, which encourages different stakeholders – publishers, institutions, funders, researchers – to recognise the importance of data citations. See slides.

Joan Starr (DataCite, California Digital Library) talked about DataCite – an international organisation set up in 2009 to help researchers find, re-use and cite data. She mentioned some of the most important motivations for researchers to share and cite data e.g. exposure and credit for the work of researchers and curators, scientific transparency and accountability for the authors and data stewards, citation tracking and understanding the impact of one’s work, verification of results and re-use for producing new research (See more at ESIP—Earth Science Information Partners). Some of the basic service that DataCite provides are DOIs for data (see a list of international partners who can support you in your area). Other services include usage statistics and reports, content negotiation, citation formatter and metadata search where one could see what kind of data is being registered in a particular field. Recently DataCite has also implemented a partnership with Orchid to have all research outputs (including data) on researchers’ profiles. See slides.

Brian Hole (Ubiquity Press) talked about data journal or encouraging data sharing and improving data citations through the publication of data and methodology in data papers. He emphasised that while at the beginning of scientific publications it was enough to share the research findings, today the the data, software and methodology should be shared as well in order to enable replication and validation of the research results. Amongst the benefits of making research data available he mentioned the collective benefits for the research community, the long-term preservation or research outputs, enabling new and more research to be done in a more efficient way, re-use of the data in teaching, ensuring of public trust in science, access to publicly-funded research outputs, opportunities for citizen science, etc. The publication of a data paper where the data is stored in a repository with a DOI and linked with a short data paper which describes the methodology of creating the dataset could be a way to incentivise individual researchers to share their data as it builds up their career record of publications. Additional benefits of having data journals is having a metadata platform where data from different (sub-) disciplines can be collected and mashed up producing new research. See slides.

The Evolving Evidence Base of Social Science

The purpose of the panel on the evolving evidence base of social science, chaired by Benjamin Mako Hill (MIT Sloan School of Management / MIT Media Lab) is to showcase examples of collecting more and better data and making more informed policy decisions about a larger volume of evidence. See audio page.

Michael McDonald (George Mason University) presented some updates on the Public Mapping Project, which involves an open source online re-districting application which the optimises re-districting according to selected criteria and allows for public participation in decision-making. Most recently there was a partnership with Mexico – with Instituto Federal Electoral (IFE) – using redistricting criteria like population equality, compactness, travel distance, respect for municipality boundaries, respect for indigenous communities, etc. A point was made about moving beyond data and having open optimisation algorithms, which can be verified, which is of great importance especially when they are the basis of an important public policy decision like the distribution of political representation across the country. Open code in this context is essential not just for the replication of research results but also for a transparent and accountable government. See slides.

Amparo Ballivian (Development Data Group, World Bank) presented the World Bank project for the collection of high frequency survey data using mobile phones. Some of the motivations for the pilot included the lack of recent and frequently updated data where e.g. poverty rates are calculated on the basis of household surveys, yet such surveys involve a long and costly process of data collection. The aspiration was related to the possibility of having comparable data data every month for thousands of households and being able to track changes in welfare and responses to crisis and having data to help decisions in real time. Two half year pilots were implemented in Peru and Honduras where e.g. it was possible to test monetary incentives, different cellphone technologies and the responses of different income groups. In contrast to e.g. crowd-sourced surveys, such a probabilistic cellphone survey provides the opportunity to draw inferences about the whole population and can be implemented at a much lower cost than the traditional household surveys. See slides.

Patrick McNeal (The Abdul Latif Jameel Poverty Action Lab) presented the AEA registry for randomised controlled trials (RCTs). Launched several weeks ago, sponsored by the AEA, the trials registry addresses the problem of publication bias in economics – setting up a place where a list is available of all ongoing RCTs in economics. The registry is open to researchers from around the world who want to register their randomised controlled trial. Some of the most interesting feedback of researchers includes e.g. having an easy and fast process for registering the studies (just about 17 fields are required), including a lot of information which can be taken from the project documentation, the optional uploading of the pre-analysis plan and the option to hide some fields until the trial is completed in order to address the fear that researches will expose their ideas publicly too early. The J-PAL affiliates who are running RCTs will have to register them in the system according to a new policy which mandates registration and there are also discussions on linking required registration with the funding policies of RCT funders. Registration of ongoing and completed trials is also pursued and training of RAs and PhD students now includes the registration of trials. See the website.

Pablo de Pedraza (University of Salamanca) chairs Webdatanet, a network that brings together web data experts from a variety of disciplines e.g. sociologists, psychologists, economists, media researchers, computer scientists working for universities, data collection institutes, companies and statistics institutes. Funded by the European Commission, the network has the goal of fostering the scientific use of web-based data like surveys, experiments, non-reactive data collection and mobile research. Webdatanet organises conferences and meetings, supports researchers to go to other institutes and do research through short scientific missions, organises training schools, web data metrics workshops, supports early career researchers and PhD students and has just started a working paper series. The network has working groups on quality issues, innovation and implementation (working with statistical institutes to obtain representative samples) and hosts bottom-up task forces which work on collaborative projects. See slides.

Mandating data availability and open licenses

The session chaired by Mireille van Echoud (IViR – Institute for Information Law) followed up on the discussions about making datasets available and citable to focus on the roles of different stakeholder and how responsibility should be shared. Mireille reminded that as the legal instruments like creative commons and open data licenses are already quite well-developed, role of the law in this context is in managing risk aversion and it is important to see how legal aspects are managed at the policy level. For instance, while the new EU Framework Programme for Research and Innovation – Horizon 2020 – carries the flag of open access to research publications, there are already a lot of exceptions which would allow lawyers to contest that data falls under an open access obligation. See audio page.

Carson Christiano (Center for Creative Global Action – CEGA) presented the perspective of CEGA, an inter-disciplinary network of researchers focused on global development, which employs rigorous evaluation techniques to measure the impact of large-scale social and economic development programs. The research transparency initiative of CEGA is focusing on the methodology and motivated by the issues of publication bias, selective presentation of results and inadequate documentation of research projects where a number of studies in e.g. medicine, psychology, political science and economics have pointed out the fragility of research results in the absence of the methods and tools for replication. CEGA has launched an opinion series: Transparency in Social Science Research and is looking into ways to promote examples of researchers, support and train early career researchers and PhD students in registering studies and pre-analysis plans and working in a transparent way.

Daniel Goroff (Alfred P. Sloan Foundation) raised the question of what funders should require of the people they make grants to, those who e.g. undertake economics research. While some funders may require data management plans and making the research outputs entirely open, this is not a simple matter and there are trade-offs involved. The Alfred P. Sloan Foundation has funded and supported the establishment of knowledge public goods, commodities which are non-rivalrous and non-excludable like big open access datasets with large setup costs (e.g. Sloan Digital Sky Survey, Census of Marine Life, Wikipedia, etc.). Public goods, however, are notoriously hard to finance. Thinking about other funding models, the involvement of markets and commercial enterprises where e.g. the data is available openly for free, but value-added services are offered at a charge could be some of the ways to make knowledge public goods useful and sustainable.

Nikos Askitas (Institute for the Study of Labor IZA) heads Data and Technology at the Institute for the Study of Labor (IZA), a private independent economic research institute, based in in Bonn, Germany, focused on the analysis of global labor markets. He challenged the notion that funders must require data availability by the researchers, since researchers are already overburdened and too many restrictions may destroy creativity and result in well-documented mediocre research. The data peer review is also a very different process than a peer review of academic research. He suggested that there is a need to create a new class of professionals that will assist the researchers and which would require proper name, titles, salaries and recognition for their work.

Jean Roth (National Bureau of Economic Research – NBER) mentioned that there has been a lot of interest as well as compliance from researchers when the NSF implemented the data managements plans. Several years ago, she modified the NBER paper submission code to incorporate adding data to submit together with the code and now researchers curate their data themselves where about 5.5% have papers have data available with the paper. A number of the data products from the NBER are very popular in online searches which helps people find the data in a format which is easier to use. As a Data Specialist at the NBER, she helps to make data more usable and to facilitate the re-use by other researchers. Over time the resources and time invested in making data more usable decrease both for the data curator and for the users of data.

The last session concentrated on further steps for the open economics community and ideas which should be pursued.
If you have any questions or need to get in touch with one of the presented projects, please contact us at economics[at]okfn.org.

Second Open Economics International Workshop

- June 5, 2013 in Announcements, Events, Featured, Open Data, Open Economics, Workshop

Next week, on June 11-12, at the MIT Sloan School of Management, the Open Economics Working Group of the Open Knowledge Foundation will gather about 40 economics professors, social scientists, research data professionals, funders, publishers and journal editors for the second Open Economics International Workshop.

The event will follow up on the first workshop held in Cambridge UK and will conclude with agreeing a statement on the Open Economics principles. Some of the speakers include Eric von Hippel, T Wilson Professor of Innovation Management and also Professor of Engineering Systems at MIT, Shaida Badiee, Director of the Development Data Group at the World Bank and champion for the Open Data Initiative, Micah Altman, Director of Research and Head of the Program on Information Science for the MIT Libraries as well as Philip E. Bourne, Professor at the University of California San Diego and Associate Director of the RCSB Protein Data Bank.

The workshop will address topics including:

  • Research data sharing: how and where to share economics social science research data, enforce data management plans, promote better data management and data use
  • Open and collaborative research: how to create incentives for economists and social scientists to share their research data and methods openly with the academic community
  • Transparent economics: how to achieve greater involvement of the public in the research agenda of economics and social science

The knowledge sharing in economics session will invite a discussion between Joshua Gans, Jeffrey S. Skoll Chair of Technical Innovation and Entrepreneurship at the Rotman School of Management at the University of Toronto and Co-Director of the Research Program on the Economics of Knowledge Contribution and Distribution, John Rust, Professor of Economics at Georgetown University and co-founder of EconJobMarket.org, Gert Wagner, Professor of Economics at the Berlin University of Technology (TUB) and Chairman of the German Census Commission and German Council for Social and Economic Data as well as Daniel Feenberg, Research Associate in the Public Economics program and Director of Information Technology at the National Bureau of Economic Research.

The session on research data sharing will be chaired by Thomas Bourke, Economics Librarian at the European University Institute, and will discuss the efficient sharing of data and how to create and enforce reward structures for researchers who produce and share high quality data, gathering experts from the field including Mercè Crosas, Director of Data Science at the Institute for Quantitative Social Science (IQSS) at Harvard University, Amy Pienta, Acquisitions Director at the Inter-university Consortium for Political and Social Research (ICPSR), Joan Starr, Chair of the Metadata Working Group of DataCite as well as Brian Hole, the founder of the open access academic publisher Ubiquity Press.

Benjamin Mako Hill, researcher and PhD Candidate at the MIT and Berkman Center for Internet and Society at Harvard Univeresity, will chair the session on the evolving evidence base of social science, which will highlight examples of how economists can broaden their perspective on collecting and using data through different means: through mobile data collection, through the web or through crowd-sourcing and also consider how to engage the broader community and do more transparent economic research and decision-making. Speakers include Amparo Ballivian, Lead Economist working with the Development Data Group of the World Bank, Michael P. McDonald, Associate Professor at George Mason University and co-principle investigator on the Public Mapping Project and Pablo de Pedraza, Professor at the University of Salamanca and Chair of Webdatanet.

The morning session on June 12 will gather different stakeholders to discuss how to share responsibility and how to pursue joint action. It will be chaired by Mireille van Eechoud, Professor of Information Law at IViR and will include short statements by Daniel Goroff, Vice President and Program Director at the Alfred P. Sloan Foundation, Nikos Askitas, Head of Data and Technology at the Institute for the Study of Labor (IZA), Carson Christiano, Head of CEGA’s partnership development efforts and coordinating the Berkeley Initiative for Transparency in the Social Sciences (BITSS) and Jean Roth, the Data Specialist at the National Bureau of Economic Research.

At the end of the workshop the Working Group will discuss the future plans of the project and gather feedback on possible initiatives for translating discussions in concrete action plans. Slides and audio will be available on the website after the workshop. If you have any questions please contact economics [at] okfn.org

Metametrik Sprint in London, May 25

- May 2, 2013 in Announcements, Call for participation, Events, Featured, Metametrik, Sprint

The Open Economics Working Group is inviting to a one-day sprint to create a machine-readable format for the reporting of regression results.

  • When: May 25, Saturday, 10:00-16:00
  • Where: Centre for Creative Collaboration (tbc), 16 Acton Street, London, WC1X 9NG
  • How to participate: please, write to economics [at] okfn.org

The event is meant for graduate students in economics and quantitative social science as well as other scientists and researchers who are working with quantitative data analysis and regressions. We would also welcome developers with some knowledge in XML and other mark-up programming and others interested to contribute to this project.

About Metametrik

Metametrik, as a machine readable format and platform to store econometric results, will offer a universal form for presenting empirical results. Furthermore, the resulting database would present new opportunities for data visualisation and “meta-regressions”, i.e. statistical analysis of all empirical contributions in a certain area.

During the sprint we will create a prototype of a format for saving regression results of empirical economics papers, which would be the basis of meta analysis of relationships in economics. The Metametrik format would include:

  • XML (or another markup language) derived format to describe regression output, capturing what dependent and independent variables were used, type of dataset (e.g. time series, panel), sign and magnitude of the relationship (coefficient and t-statistic), data sources, type of regression (e.g. OLS, 2SLS, structural equations), etc.
  • a database to store the results (possible integration with CKAN) – a user interface to allow for entry of results to be translated and saved in the Metametrik format. Results could be also imported directly from statistical packages
  • Visualisation of results and GUI – enabling queries from the database and displaying basic statistics about the relationships.

Background

Since computing power and data storage have become cheaper and more easily available, the number of empirical papers in economics has increased dramatically. Despite the large numbers of empirical papers, however, there is still no unified and machine readable standard for saving regression results. Researchers are often faced with a large volume of empirical papers, which describe regression results in similar yet differentiated ways.

Like bibliographic machine readable formats (e.g. bibtex), the new standard would facilitate the dissemination and organization of existing results. Ideally, this project would offer an open storage where researchers can submit their regression results (for example in an XML type format). The standard could also be implemented in a wide range of open source econometric packages and projects like R or RePec.

From a practical perspective, this project would greatly help to organize the large pile of existing regressions and facilitate literature reviews: If someone is interested in the relationship between democracy and economic development, for example, s/he need not go through the large pile of current papers but can simply look up the relationship on the open storage: The storage will then produce a list of existing results, along with intuitive visualizations (what % of results are positive/negative, how do the results evolve over time/i.e. is there a convergence in results). From an academic perspective, the project would also facilitate the compilation of meta-regressions that have become increasingly popular. Metametrik will be released under an open license.

If you have further questions, please contact us at economics [at] okfn.org

Can we have open data without open access?

- February 11, 2013 in Events, Featured, Open Access, Open Data, Open Economics, Open Research

Many of the motivations which drive open access are similar to why we want open data in social science research: making one’s research more widely available to the research community and to the wider public, producing more and better research that can be reproduced and verified.

openaccess

Photo by biblioteekje

On February 7-8, the University of Minho hosted the OpenAIRE Interoperability workshop, inviting academics, repository managers, publishers, funders, national help desks and open science advocates to discuss the challenge of interoperability in the emerging open access infrastructures.

Reviewing some of the reasons for open access, one would see that they are the same as for open data. Eloy Rodriguez (University of Minho Documentation Services) presented the drivers for open access what came in the discussion like the monitoring and assessment of research output, the visibility and impact, the economic benefits including innovation, the empowerment of institutions to preserve their own research outputs and the change in science and research dissemination. All of these drivers for open access are equally important for data and code as they represent the evidence which backs up a publication.

openaccess

Cartoon by Jorge Cham

Can we have open access without open data?

The answer is “No” according to Geoffrey Boulton (Royal Society, University of Edinburgh), Chair of the Working Group of the Science as an Open Enterprise report, who contended that publishing and data are invariably linked as data constitutes the evidence and maintains the self-correction and credibility in science: “Science corrects itself as long as you provide the knowledge by which it can do so”.
Brain Hole (Ubiquity Press) stressed that research needs an effective and efficient model of distribution and presented the model of publishing datasets in a similar way in which research is published – in peer-reviewed open access data journals. This model would create additional incentives for sharing data, as researchers would also gain citations and reputation by publicising their datasets.

Who owns the data?

Publishers sell the published research which was signed over to them by the very same research producers who are buying it.

Victoria Stodden (Columbia University, Open Economics Advisory Panel member) wrote in her blog how similar to the copyright sign-over to journals, many researchers are required to sign non-disclosure agreements when working with commercial data, even when no privacy issues are involved, preventing them from sharing it with other researchers. In some fields of science it goes even further, e.g. Ben Goldacre writes in Bad Pharma that “university administrators and ethics committees permit contracts with industry that explicitly say that the sponsor can control the data”, a research misconduct which is also one of the reasons for publication bias and overstating the benefits of treatments in medicines research.

How they can go together

Storing, linking and preserving data from social science research in a sustainable manner may be more complex than creating open access repositories for publications: after all researchers work sometimes with enormous datasets which can be usable by the research community only with proper descriptions of the research process in which the data was generated. After all even if some publishers or funders have data availability policies: these are rarely enforced, as making research data available would also require the establishment and maintenance of an elaborate data management infrastructures.

However, once open access infrastructures exist, it could be possible to have the data and code as one of the resources published along with the paper itself. Preserving these datasets on a large scale and in a sustainable manner would require massive repositories where datasets receive permanent digital identifiers, which would guarantee stable linking even if publishers or universities change the URLs.

While open access policies and structures might be getting more popular in some countries or science fields, there is still limited understanding of how to make data from research available on a wider scale. It is however clear that the experiences of the open access movement are key lessons for our understanding of how to make research data openly available.

First Open Economics International Workshop Recap

- January 25, 2013 in Economic Publishing, Events, Featured, Open Access, Open Data, Open Economics, Open Research, Open Tools, Workshop

The first Open Economics International Workshop gathered 40 academic economists, data publishers and funders of economics research, researchers and practitioners to a two-day event at Emmanuel College in Cambridge, UK. The aim of the workshop was to build an understanding around the value of open data and open tools for the Economics profession and the obstacles to opening up information, as well as the role of greater openness of the academy. This event was organised by the Open Knowledge Foundation and the Centre for Intellectual Property and Information Law and was supported by the Alfred P. Sloan Foundation. Audio and slides are available at the event’s webpage.

Open Economics Workshop

Setting the Scene

The Setting the Scene session was about giving a bit of context to “Open Economics” in the knowledge society, seeing also examples from outside of the discipline and discussing reproducible research. Rufus Pollock (Open Knowledge Foundation) emphasised that there is necessary change and substantial potential for economics: 1) open “core” economic data outside the academy, 2) open as default for data in the academy, 3) a real growth in citizen economics and outside participation. Daniel Goroff (Alfred P. Sloan Foundation) drew attention to the work of the Alfred P. Sloan Foundation in emphasising the importance of knowledge and its use for making decisions and data and knowledge as a non-rival, non-excludable public good. Tim Hubbard (Wellcome Trust Sanger Institute) spoke about the potential of large-scale data collection around individuals for improving healthcare and how centralised global repositories work in the field of bioinformatics. Victoria Stodden (Columbia University / RunMyCode) stressed the importance of reproducibility for economic research and as an essential part of scientific methodology and presented the RunMyCode project.

Open Data in Economics

The Open Data in Economics session was chaired by Christian Zimmermann (Federal Reserve Bank of St. Louis / RePEc) and was about several projects and ideas from various institutions. The session examined examples of open data in Economics and sought to discover whether these examples are sustainable and can be implemented in other contexts: whether the right incentives exist. Paul David (Stanford University / SIEPR) characterised the open science system as a system which is better than any other in the rapid accumulation of reliable knowledge, whereas the proprietary systems are very good in extracting the rent from the existing knowledge. A balance between these two systems should be established so that they can work within the same organisational system since separately they are distinctly suboptimal. Johannes Kiess (World Bank) underlined that having the data available is often not enough: “It is really important to teach people how to understand these datasets: data journalists, NGOs, citizens, coders, etc.”. The World Bank has implemented projects to incentivise the use of the data and is helping countries to open up their data. For economists, he mentioned, having a valuable dataset to publish on is an important asset, there are therefore not sufficient incentives for sharing.

Eustáquio J. Reis (Institute of Applied Economic Research – Ipea) related his experience on establishing the Ipea statistical database and other projects for historical data series and data digitalisation in Brazil. He shared that the culture of the economics community is not a culture of collaboration where people willingly share or support and encourage data curation. Sven Vlaeminck (ZBW – Leibniz Information Centre for Economics) spoke about the EDaWaX project which conducted a study of the data-availability of economics journals and will establish publication-related data archive for an economics journal in Germany.

Legal, Cultural and other Barriers to Information Sharing in Economics

The session presented different impediments to the disclosure of data in economics from the perspective of two lawyers and two economists. Lionel Bently (University of Cambridge / CIPIL) drew attention to the fact that there is a whole range of different legal mechanism which operate to restrict the dissemination of information, yet on the other hand there is also a range of mechanism which help to make information available. Lionel questioned whether the open data standard would be always the optimal way to produce high quality economic research or whether there is also a place for modulated/intermediate positions where data is available only on conditions, or only in certain part or for certain forms of use. Mireille van Eechoud (Institute for Information Law) described the EU Public Sector Information Directive – the most generic document related to open government data and progress made for opening up information published by the government. Mireille also pointed out that legal norms have only limited value if you don’t have the internalised, cultural attitudes and structures in place that really make more access to information work.

David Newbery (University of Cambridge) presented an example from the electricity markets and insisted that for a good supply of data, informed demand is needed, coming from regulators who are charged to monitor markets, detect abuse, uphold fair competition and defend consumers. John Rust (Georgetown University) said that the government is an important provider of data which is otherwise too costly to collect, yet a number of issues exist including confidentiality, excessive bureaucratic caution and the public finance crisis. There are a lot of opportunities for research also in the private sector where some part of the data can be made available (redacting confidential information) and the public non-profit sector also can have a tremendous role as force to organise markets for the better, set standards and focus of targeted domains.

Current Data Deposits and Releases – Mandating Open Data?

The session was chaired by Daniel Goroff (Alfred P. Sloan Foundation) and brought together funders and publishers to discuss their role in requiring data from economic research to be publicly available and the importance of dissemination for publishing.

Albert Bravo-Biosca (NESTA) emphasised that mandating open data begins much earlier in the process where funders can encourage the collection of particular data by the government which is the basis for research and can also act as an intermediary for the release of open data by the private sector. Open data is interesting but it is even more interesting when it is appropriately linked and combined with other data and the there is a value in examples and case studies for demonstrating benefits. There should be however caution as opening up some data might result in less data being collected.

Toby Green (OECD Publishing) made a point of the different between posting and publishing, where making content available does not always mean that it would be accessible, discoverable, usable and understandable. In his view, the challenge is to build up an audience by putting content where people would find it, which is very costly as proper dissemination is expensive. Nancy Lutz (National Science Foundation) explained the scope and workings of the NSF and the data management plans required from all economists who are applying for funding. Creating and maintaining data infrastructure and compliance with the data management policy might eventually mean that there would be less funding for other economic research.

Trends of Greater Participation and Growing Horizons in Economics

Chris Taggart (OpenCorporates) chaired the session which introduced different ways of participating and using data, different audiences and contributors. He stressed that data is being collected in new ways and by different communities, that access to data can be an enormous privilege and can generate data gravities with very unequal access and power to make use of and to generate more data and sometimes analysis is being done in new and unexpected ways and by unexpected contributors. Michael McDonald (George Mason University) related how the highly politicised process of drawing up district lines in the U.S. (also called Gerrymandering) could be done in a much more transparent way through an open-source re-districting process with meaningful participation allowing for an open conversation about public policy. Michael also underlined the importance of common data formats and told a cautionary tale about a group of academics misusing open data with a political agenda to encourage a storyline that a candidate would win a particular state.

Hans-Peter Brunner (Asian Development Bank) shared a vision about how open data and open analysis can aid in decision-making about investments in infrastructure, connectivity and policy. Simulated models about investments can demonstrate different scenarios according to investment priorities and crowd-sourced ideas. Hans-Peter asked for feedback and input on how to make data and code available. Perry Walker (new economics foundation) spoke about the conversation and that a good conversation has to be designed as it usually doesn’t happen by accident. Rufus Pollock (Open Knowledge Foundation) concluded with examples about citizen economics and the growth of contributions from the wider public, particularly through volunteering computing and volunteer thinking as a way of getting engaged in research.

During two sessions, the workshop participants also worked on Statement on the Open Economics principles will be revised with further input from the community and will be made public on the second Open Economics workshop taking place on 11-12 June in Cambridge, MA.

Open Research Data Handbook Sprint

- January 17, 2013 in Events, Featured, Open Data, Open Economics, Open Research, Sprint

On February 15-16, the Open Research Data Handbook Sprint will happen at the Open Data Institute, 65 Clifton Street, London EC2A 4JE.

The Open Research Data Handbook aims to provide an introduction to the processes, tools and other areas that researchers need to consider to make their research data openly available.

Join us for a book sprint to develop the current draft, and explore ways to remix it for different disciplines and contexts.

Who it is for:

  • Researchers interested in carrying out their work in more open ways
  • Experts on sharing research and research data
  • Writers and copy editors
  • Web developers and designers to help present the handbook online
  • Anyone else interested in taking part in an intense and collaborative weekend of action

Register at Eventbrite

What will happen:

The main sprint will take place on Friday and Saturday. After initial discussions we’ll divide into open space groups to focus on research, writing and editing for different chapters of the handbook, developing a range of content including How To guidance, stories of impact, collections of links and decision tools.

A group will also look at digital tools for presenting the handbook online, including ways to easily tag content for different audiences and remix the guide for different contexts.

Agenda:

Week before & after:

  • Calling for online contributions and reviews

Friday:

  • 12.00 – 14:00: Seminar or bring your own lunch on open research data
  • 14:00 – 17:30: planning and initial work in the handbook in small teams

Saturday:

  • 10.00 – 10:30: Arrive and coffee
  • 10.30 – 11.30: Introducing open research – lightning talks
  • 11.30 – 13:30: Forming teams and starting sprint. Groups on:
    • Writing chapters
    • Decision tools
    • Building website & framework for book
    • Remixing guide for particular contexts
  • 13.30 – 14:30: Lunch
  • 14.30 – 16:30: Working in teams
  • 17.30 – 18:30: Report back
  • 18:30 – …… : Pub

Partners:

OKF Open Science Working Group – creators of the current Open Research Data Handbook
OKF Open Economic Working Group – exploring economics aspects of open research
Open Data Research Network – exploring a remix of the handbook to support open social science
research in a new global research network, focussed on research in the Global South.
Open Data Institute – hosting the event

First Open Economics International Workshop

- December 17, 2012 in Events, Featured, Open Access, Open Data, Open Economics, Open Research, Workshop

**You can follow all the goings-on today and tomorrow through the [live stream](http://bambuser.com/v/3232222).**

On 17-18 December, economics and law professors, data publishers, practitioners and representatives from international institutions will gather at Emmanuel College, Cambridge for the First Open Economics International Workshop. From showcasing the examples of successes in collaborative economic research and open data to reviewing the legal cultural and other barriers to information sharing this event aims to build an understanding of the value of open data and open tools for the Economics profession and the obstacles to opening up information in Economics. The workshop will also explore the role of greater openness in broadening understanding of and engagement with Economics among the wider community including policy-makers and society.

This event is part of the Open Economics project, funded by the Alfred P. Sloan Foundation and is a key step in identifying best practice as well as legal, regulatory and technical barriers and opportunities for open economic data. A statement on the Open Economics Principles will be produced as a result of the workshop.

Introduction:
Setting the Scene – General perspectives
Rufus Pollock, Open Knowledge Foundation; Daniel L. Goroff, Alfred P. Sloan Foundation, Tim Hubbard, Wellcome Trust Sanger Institute, Victoria Stodden, Columbia Institute / RunMyCode.org
Videostream: Here
Session: “Open Data in Economics – Reasons, Examples, Potential”:
Examples of open data in economics so far and its potential benefits
Session host: Christian Zimmermann, (Federal Reserve Bank of St. Louis, RePEc), Panelists: Paul David (Stanford University, SIEPR), Eustáquio J. Reis (Institute of Applied Economic Research – Ipea), Johannes Kiess (World Bank), Sven Vlaeminck (ZBW – Leibniz Information Centre for Economics).
Videostream: Part 1 and Part 2
Session: “Legal, Cultural and other Barriers to Information Sharing in Economics” : Introduction and overview of challenges faced in information sharing in Economics
Session host: Lionel Bently, (University of Cambridge / CIPIL), Panelists: Mireille van Eechoud, (Institute for Information Law), David Newbery, (University of Cambridge), John Rust, (Georgetown University).
Session: “Current Data Deposit and Releases – Mandating Open Data?”: Round table discussion with stakeholders: Representatives of funders, academic publishing and academics.
Session host: Daniel L. Goroff, (Alfred P. Sloan Foundation), Panelists: Albert Bravo-Biosca, (NESTA), Toby Green, (OECD Publishing), Nancy Lutz, (National Science Foundation).
Session: Trends of Greater Participation and Growing Horizons in Economics: Opening up research and the academy to wider engagement and understanding with the general public, policy-makers and others.
Session host: Chris Taggart, (OpenCorporates), Panelists: Michael P. McDonald, (George Mason University), Hans-Peter Brunner, (Asian Development Bank), Perry Walker, (New Economics Foundation)

The workshop is a designed to be a small invite-only event with a round-table format allowing participants to to share and develop ideas together. For a complete description and a detailed programme visit the event website.

Can’t attend? Join the LIVESTREAM here


The event is being organized by the Centre for Intellectual Property and Information Law (CIPIL) at the University of Cambridge and Open Economics Working Group of the Open Knowledge Foundation and is funded by the Alfred P. Sloan Foundation. More information about the Working Group can be found online.

Interested in getting updates about this project and getting involved? Join the Open Economics mailing list:

Data Party: Tracking Europe’s Failed Banks

- October 18, 2012 in Data Party, Open Economics

nuklr.dave CC BY

This fall marked the five year anniversary of the collapse of UK-based Northern Rock in 2007. Since then an unknown number of European banks have collapsed under the weight over plummeting housing markets, financial mismanagement and other reasons. But how many European banks did actually crash during the crisis?

In the United States, the Federal Deposit Insurance Corporation keeps a neat Failed bank list, which has recorded 496 bank failures in the US since 2000.

Europe however, and for that matter the rest of the world, still lack similar or comparable data on how many banks actually failed since the beginning of the crisis. Nobody has collected data on how many Spanish cajas actually crashed and how many troubled German landesbanken actually went under.

At the Open Economics Skype-chat earlier this month it was agreed to take the first steps for creating a Failed Bank Tracker for Europe at an upcoming “Data party”:

Join the Data Party

Wednesday 24th October at 5:30pm London / 6:30pm Berlin.

We hope that a diverse group of you will join in the gathering of failed bank data. During the Data Party you will have plenty of chances to discuss al questions regarding bank failures whether they be specific cases. Do not let your country or region leave a blank spot when we draw up the map of bank failures.

At the data party we will go through some of these questions:

  • What kind of failed bank data do we wish to collect (date, amount, type of intervention, etc.)?
  • What are the possible sources (press, financial regulators or European agencies)?
  • Getting started with the data collection for the Failed Bank Tracker

 

You can join the Data party by adding your name and skype ID here.

 

Getting good data: What makes a failed bank?

For this first event collecting data on failed European banks should provide more than enough work for us. At this moment neither the European Commission, Eurostat nor the European Banking Authority are keeping any records of bank failures like in the FDIC in the US. The best source of official European information available is from DG Competition, which keeps track of approved state aid measures in member states in their State Aid database. Its accuracy is however limited as it contains cases from state intervention with specific bank collapses to sector wide bank guarantee schemes.

A major reason for the lack of data on bank failures is the fact that legislation often differs dramatically between countries in terms of what actually defines a bank failure. In early 2012 I asked the UK regulator FSA, if they could provide a list of failed banks similar to the list from FDIC in the US. In a response the FSA asserted that the UK did not have a single bank failures since 2007:

“I regret that we do not have a comparable list to that of the US. Looking at the US list it appears to be a list of banks that have entered administration. As far as I am aware no UK banks have entered administration in this period, though of course a number were taken over or received support during the crisis.”

The statement from FSA demonstrate that, for instance Northern Rock, which brought a £ 2bn loss on UK taxpayers, never officially failed, due to the fact that it never entered administration. The example from FSA demonstrates that collecting data on bank failures would be  interesting and useful.

Earlier this year I got a head start on the data collection when a preliminary list of failed banks, were collected from both journalists and national agencies such as the Icelandic the Financial Supervisory Authority. The first 65 banks entered in the tracker, mostly from Northern Europe are available here.

Looking forward to bring data on failed banks together at the Data Party. 

How can open data help rebuild trust in business?

- October 8, 2012 in Audit and Accounting, Events, External Projects, Hackathon, Public Finance and Government Data

A few months ago, the Finance Innovation Lab launched AuditFutures – a new systemic work around rebuilding trust in business. The first innovation workshop on 4 July was a tremendous success and we have developed a strategy to move the work forward. Not surprisingly, open data came up in the discussions in two of the eight innovation domains. We feel that the knowledge and perspectives of the OKFN will bring value to the discussion.

 

Where are we?

The Finance Innovation Lab was established about four years ago by ICAEW and WWF-UK to inspire a financial system that sustains people and planet. This year, the Lab has been selected by NESTA and the Guardian as one of the top radicals who have transformed society. Building on the established success and momentum of the Lab, the Audit and Assurance Faculty of ICAEW has taken a bold initiative to innovate audit and reconnect the profession with the public interest.

The audit and accounting profession is at crossroads and we believe this is an opportunity to host a positive and proactive process about the future of the profession. Using our open and participatory approach, we organised an innovation workshop to crowdsource ideas for audit so that it can best serve society. On 4 July we convened over 120 participants from more than 75 firms and organisations.

 

What is our approach?

We designed a process to identify the emerging themes that helped form the agenda for the day. We had over hundred perspectives in the room, emerging from over twenty discussion tables. The goal of this process was to collect ideas in a transparent and democratic way, and to visually identify common patterns.

Some of the emerging themes are: the need for more flexibility in audits, standards and regulation;  integration of a broader stakeholder community;  better communication of the value of audit;  engaged dialogue with investors;  developing a new culture of challenge and critical thinking.

We clustered almost fifty themes into eight innovation domains: New Audit Methods, Changing the Culture of Audit, Serving a Wider Stakeholder Base, Rebuilding Trust, A New Reporting Model, IT Innovation, Auditor Reporting, and Recruitment and Training. We hosted in-depth working group discussions around these areas and collected further insights into what would move the ideas forward. Most participants signed up to continue working on some specific areas and we are working with them now.

You can watch a short video from the first assembly here.

 

“Open data is trending – get on board”

This was the summary tweet for one of the working groups. As part of the process of distilling insights and intelligence, we had asked each of the 16 working groups to come up with a tweet that summarises their work.

In two of our innovation domains – ‘Rebuilding Trust’ and ‘IT Innovation’ – open data came up among the discussed themes. The groups looked into what would make most difference to their chosen areas. More open data and transparent information in the audit process have the potential to directly engage wider stakeholder groups. For example, audit files and data could be made publicly open in a machine-readable format.

There is an interesting dynamic in thinking about what open business data could mean. One set of questions would focus around the range of information companies disclose to investors and public scrutiny. In the current climate of tough competition and patent wars, open data might not be regarded favourably.  Another question would be the role of auditor – what specialist skills would be the needed to analyse and visualise the data? Could Google be the next big audit firm?

 

Can OpenAudit be one of the next steps?

The Open Knowledge Foundation has made a significant push towards the transparency and accountability of published financial statements in the public and government sectors. Earlier this year OKFN published a comprehensive report on transparency and accountability in public finance. The report demonstrates the ways technology can contribute to fiscal transparency and offers perspectives and recommendations in several areas like data availability and standards for fiscal data.

In the same time, the OpenCorporates project
has made significant progress in creating the open database of the corporate world. Currently, more than 46 million companies from over 60 jurisdictions are listed (including almost 8 million from  the UK). The project was an award winner in the OpenDataChallenge and in early 2012 was appointed to the Financial Stability Board’s advisory panel on a legal entities identification for finan.

What should be the next step in rebuilding trust in business information? We would like to discuss how open data can help the audit profession and what role auditors can play in the global trend of disclosing more information to the public. It is important to view open data solutions from the perspective of the public interest. Do we have a good understanding of what is relevant and important to the sectors of society that audit serves? It is important to consider whether the users of audit would value it more if it were an insightful dynamic infographic based on open data.

The two starting points for our discussion on OpenAudit would be the business/IT models for open audit data and, more importantly, the broader question on whether provenance of data would help build trust in business.

We definitely have more questions and fewer answers at the moment. The approach of the Finance Innovation Lab is not to ask rhetorical questions but to invite individuals and organisations from diverse fields to search for the right question together. How can audit and open data help rebuild trust in business?

We believe this is a conversation worth having.

To join the discussion, please contact Martin Martinoff, project lead of AuditFutures.